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ABSTRACT 


Geographic paralogy —analogous to the molec- 
ular phenomenon—has gone unrecognized in cla- 
distic biogeography. It is evidenced by duplication 
or overlap in geographic distribution of taxa re- 
lated by a particular node of a cladogram of or- 
ganisms. Geographically paralogous nodes in- 
crease basally, therefore nonrandomly, in clado- 
grams generally, such that most nodes of complex 
cladograms of organisms are geographically par- 
alogous. A novel algorithm, implemented in a pre- 
liminary MS-DOS program, reduces a more or less 
complex cladogram of organisms to one or more 


subtree (area cladogram) that is paralogy free. Sub- 
tree analysis of a number of published studies in- 
dicates that geographic data associable with in- 
formative nodes of such subtrees appear to be the 
only data relevant to cladistic biogeography; such 
data, represented as either components or three 
items in a matrix for parsimony analysis, are found 
to be remarkably consistent; most geographic in- 
consistency previously noted in cladistic bioge- 
ography, through parsimony analysis of matrices 
of geographic data, is merely the effect of paralogy. 


INTRODUCTION 


In cladistic biogeography, nodes ofa clado- 
gram of organisms are potentially informa- 
tive about relationships among geographic 
areas occupied by the organisms. Cladistic 
biogeographers associate geographic data with 
each node, and combine and interpret the 
data of all nodes of one or more cladogram 
of organisms. Availability of parsimony pro- 
grams encouraged their use for such pur- 
poses. The programs, designed to find one or 
more tree that best fits a particular sample of 
data, require that data be organized in binary 
form (zeros and ones), arranged in a matrix 
of rows corresponding to areas, and columns 
(characters) corresponding to nodes. This ap- 


METHODS 
COMPUTER PROGRAMS 


Programs used in analyses include 
Hennig86 (Farris, 1988), PAUP (Swofford, 
1993), and those in the current TAX package 
(Nelson and Ladiges, 1995). Hennig86 and 
PAUP are programs used for parsimony 
analysis of matrices. Hennig86 was used for 
matrices with fewer than 1000 characters and 
for search for minimal trees (see below); and 
PAUP, for matrices with more than 1000 
characters. The TAX package includes pro- 
grams (TAX, TAS) used for preparation of 
three-item matrices for analysis by Hennig86 
and PAUP. The package includes also a pre- 
liminary program (TASS) for subtree analysis 
(used to enumerate subtrees and prepare both 
component and three-item matrices for sub- 
trees); and a utility program (TAXUTIL) for 


proach developed various ways to associate 
nodes and geographic data, and various ways 
to represent data in a matrix (summaries in 
Humphries and Parenti, 1986; Humphries et 
al., 1988; Humphries, 1992; Legendre, 1990). 
We consider two general notions of data for 
cladistic biogeography, apply them to two 
benchmark studies, those of Brundin (1966) 
and Mayden (1988), and suggest a comple- 
mentary notion (subtree algorithm) that of- 
fers hope of better results. We illustrate this 
possibility through analysis of these two 
studies and also of some of those reviewed 
by Craw (1989), Page and Lydeard (1994), 
and Morrone and Carpenter (1994). 


analysis of tree files (used to enumerate nodes 
per tree in a file and three-item statements 
per tree, and to select trees from a file). 


SUBTREE ALGORITHM 


As described and exemplified below, and 
implemented in the program TASS, the sub- 
tree algorithm builds subtrees starting at each 
terminal node and progressing to the base of 
a cladogram of organisms. A node (taxon) 
that relates organisms that, as different taxa, 
do not overlap in geographic distribution is 
associated with the nonoverlapping geo- 
graphic data. A node (taxon) that relates or- 
ganisms that, as different taxa, overlap is 
deemed paralogous and is not generally as- 
sociated with geographic data, except in the 
following case: if a node leads directly to one 
or more terminal taxon that is geographically 
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Cladograms Component Three-item 
of organisms matrices matrices 
" ‘ nodes: 01 1 
0 ond 
bps A 10 AO 
c B11 B 1 
Cc 11 cl 
Example 1 
nodes: 012 11 
koa A 
Ls A100 A 00 
cc B110 B11 
C2 c 111 Cc 11 
Example 2 
me nodes: 012 22 
Te A110 A 00 
2B B111 Bi1i 
Cc cil C11 
Example 3 
hee nodes: 012 12 
"Th A A110 A 10 
2—= B B111 B11 
Ca Ciii C01 
Example 4 
for nodes: 012 112 
Cc 
de A A110 A110 
2—= B B1il B111 
Ca Citi C001 
Example 5 
fOp15— Al nodes: 01234 1123444 
Lp 
La A 11110 A 1111000 
aye a2 B 11001 B1100111 
C2 c 11111 Cc 0011111 
Sum A 
Co 
Example 6 
Fig. 1. Six cladograms (Examples 1-6) of or- 


ganisms in areas A, B, and C, with data for nodes 
represented in component and three-item matrices 
(all-zero outgroup assumed). 


widespread, and part of that distribution 
overlaps with that of another taxon, or taxa, 
then the widespread distribution is reduced 
to the nonoverlapping geographic element. 
An example is that of five taxa with distri- 
butions in areas A, AB, B, BC, D, and related 
by the cladogram ((A AB)(B BC))D. All three 
informative nodes of the cladogram are par- 
alogous in the strict sense, yet one node ev- 
idently relates areas A and B more closely 
than to D, and another node evidently relates 
areas B and C more closely than to D. The 
algorithm reduces the widespread distribu- 
tions AB and BC to B and C, respectively, in 
accordance with assumption 2 (Nelson and 
Ladiges, 1991b). The cladogram with re- 
duced distributions yields two subtrees, 
(AB)D and (BC)D, which combine as 
(ABC)D. The same result is obtained from 
the cladogram ((A AB)(B BC))BD. 
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MINIMAL TREES 


A matrix with missing data invites existing 
parsimony programs to find trees that have 
shortest length but are overresolved for the 
data of the matrix (Platnick et al., 1991; Nel- 
son and Ladiges, 1993). Current parsimony 
programs save such trees in computer mem- 
ory if the trees do not conflict with data, even 
though most of their nodes might be unsup- 
ported by data. Saving overresolved trees is 
a defect of current programs, but the defect 
can be overcome through search for a least- 
resolved (minimal) tree. Search for minimal 
trees was conducted with the xx function of 
Hennig&6. After entering the appropriate ma- 
trix and the appropriate tree, nodes were col- 
lapsed, and areas moved to a more basal po- 
sition in the tree, one at a time and in com- 
bination on a trial-and-error basis, in a search 
for the least resolved tree with shortest length. 
A tree was judged minimal when further col- 
lapse increased tree length. 


GEOGRAPHIC DATA (FIG. 1) 


There are two basic notions about geo- 
graphic data associable with nodes of a clado- 
gram relating organisms. Both notions con- 
cern data organized as 0-entries and 1-entries 
in a matrix. We refer to the data of these 
notions as “components” (Nelson and Plat- 
nick, 1981: 169) and “three items” (Nelson, 
1992; Nelson and. Ladiges, 1993) and illus- 
trate some differences between them in six 
examples (fig. 1). 


EXAMPLE 1 


Consider an example of organisms that live 
in geographic areas A, B, and C. Organisms 
of areas B and C (node 1) are related more 
closely than to organisms of area A. Both 
nodes (0 and 1) have components associable 
with them, whereas only one node (1) has 
three items. 

The rationale for components is to follow 
each node to all branch tips to which it leads 
and to sum the data at the tips. Node 0 leads 
to three tips with data A + B + C; hence for 
node 0 of the matrix, each area receives the 
value 1. Node 1 leads to two tips with data 
B + C; hence for node 1 of the matrix, area 
A receives the value 0, and areas B and C 
receive the value 1. 

The rationale for three items is to see each 
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node as a relation (connection) between the 
branches (and their tips) to which it leads— 
relating these branches more closely than to 
other branches (and tips) of the tree. If there 
were another area, e.g., area D, at the tip of 
amore basal branch, then node 0 would relate 
area A and areas B and C more closely than 
to area D. Because no such area D is included 
in the tree, node 0 does not function as a 
relation (connection) with that significance, 
and node 0 is without data. In contrast, node 
1 relates areas B and C more closely than to 
areas at the tips of other branches. Here the 
only such area is A. Hence for node 1 of the 
matrix, area A receives the value 0, and areas 
B and C receive the value 1. 

With an all-zero outgroup, parsimony 
analysis of each matrix yields one tree, A(BC) 
with consistency index (ci) 100. 


EXAMPLE 2 


Organisms occur in areas A, B, and C, with 
two sorts of organisms (C1 and C2) in area 
C. Two sorts of organisms living in area C 
are related (node 2) more closely than to or- 
ganisms of areas A and B, and are related 
(node 1) more closely to organisms of area B 
than to organisms of area A. All three nodes 
have component data. Only node | has three- 
item data, but it has twice as much data as 
node | of Example 1 (see above). Parsimony 
analysis of each matrix yields one tree, A(BC), 
with ci 100. 

The rationale for components is the same 
as that of Example 1, with the added factor 
of multiple occurrence, or redundancy, in area 
C. Node 0 leads to four branch tips with data 
A+B+C+C. Redundancy in the data is 
simply eliminated, leaving the data for node 
0 as ABC. Similarly, node 1 leads to three 
branch tips, with data B + C + C, reducing 
to BC. Node 2 leads to two branch tips with 
data C + C, reducing to C. 

The rationale for three items sees geo- 
graphic data in a different way. Node 0 is 
without data (as in Example 1). Node | re- 
lates area B and area C (C1) more closely than 
to area A, and again relates area B and area 
C (C2) more closely than to area A; hence in 
the matrix, two columns (characters) are re- 
quired for node 1. Node 2 cannot logically 
relate area C (C1) and area C (C2) more close- 
ly than to areas A and B; hence, node 2 is 
seen as without data. Relationship is a con- 
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nection between different areas; to claim that 
an area is related, or connected, to itself more 
closely than to another area is logically ab- 
surd. 


EXAMPLE 3 


Organisms occur in areas A, B, and C, with 
two sorts of organisms (Al and A2) in area 
A. Organisms of areas B and C are related 
(node 2) more closely than to organisms of 
area A, but are related (node 1) to one sort 
of organisms (A2) of area A more closely than 
to the other sort (A1). All three nodes have 
component data. Only node 2 has three-item 
data, but it has twice as much data as node 
1 of Example 1 (see above). Parsimony anal- 
ysis of each matrix yields one tree, A(BC), 
with ci 100. 

The rationale for components is similar to 
that of Example 2. Node 0 leads to A+ A 
+ B+ C, reducing to ABC, and so on. The 
rationale for three items sees node | as with- 
out data: node 1 cannot logically relate area 
A (A2) and areas B and C more closely than 
to area A (A1). Node 2 relates areas B and C 
more closely than to area A (A2), and again 
relates areas B and C more closely than to 
area A (Al); hence in the matrix, two col- 
umns (characters) are required for node 2. 


EXAMPLE 4 


Organisms occur in areas A, B, and C, with 
two sorts of organisms (C1 and C2) in area 
C. Organisms of one sort (C1) of area C and 
those of area B are related (node 2) more 
closely than to those of area A and to those 
of the other sort (C2) ofarea C, and are related 
(node 1) more closely to those of area A than 
to the other sort (C2) of area C. All three 
nodes have component data (with exactly the 
same characters as Example 3). Only nodes 
1 and 2 have three-item data. Parsimony 
analysis of the component matrix yields one 
tree, A(BC), with ci 100. Parsimony analysis 
of the three-item matrix yields two trees, 
(AB)C and A(BC), with ci 66. 

The rationale for components is similar to 
that of preceding examples. In the rationale 
for three items, node | relates areas A and B 
more closely than to area C (C2), but cannot 
logically relate areas A and C (C1) more close- 
ly than to area C (C2). Node 2 relates areas 
B and C (C1) more closely than to area A, 
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but cannot logically relate areas B and C (Cl) 
more closely than to area C (C2). Three-item 
data conflict, (AB)C versus A(BC), and com- 
ponent data do not. Parsimony analysis of 
the three-item matrix does not resolve the 
conflict because there is no preponderance of 
one pattern. Both trees, (AB)C and A(BC), 
have length 3 for the three-item matrix. 


EXAMPLE 5 


Organisms occur in areas A, B, and C, with 
three sorts of organisms (C1, C2, C3) in area 
C. Organisms of area B and one sort (C1) of 
area C are related (node 2) more closely than 
to those of area A and to the other sorts (C2 
and C3) of area C, and are related (node 1) 
more closely to those of area A than to the 
other sorts (C2 and C3) of area C. All three 
nodes have component data (with exactly the 
same characters as Examples 3 and 4). Only 
nodes | and 2 have three-item data. Parsi- 
mony analysis of the component matrix yields 
one tree, A(BC), with ci 100. Parsimony anal- 
ysis of the three-item matrix yields one tree, 
(AB)C, with ci 75. 

The rationale for components and the ra- 
tionale for three items are similar to those of 
the preceding examples. In this example, 
three-item data conflict, (AB)C versus A(BC), 
and component data do not. Parsimony anal- 
ysis of the three-item matrix resolves the con- 
flict because there is a preponderance of one 
pattern, (AB)C. Tree (AB)C has length 4, and 
tree A(BC) has length 5 for the three-item 
matrix. 


EXAMPLE 6 


Organisms occur in three areas, with three 
sorts of organisms (Al, A2, A3) in area A, 
and three sorts (Cl, C2, C3) in area C. All 
five nodes have component data. Only nodes 
1-4 have three-item data. In the three-item 
matrix, there are two characters for node 1, 
(A1B)C2 and (A1B)C3; one character each 
for nodes 2 and 3, (A2C2)B and (A3C3)B, 
respectively; and three characters for node 4, 
A1(BC1), A2(BC1), and A3(BC1). 

Parsimony analysis of the component ma- 
trix yields one tree, (AC)B length 6, ci 83, 
retention index (ri) 66; with uninformative 
characters (0, 1) inactive, length 4, ci 75, ni 
66. Parsimony analysis of the three-item ma- 
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trix (uniformly or fractionally weighted) yields 
one tree, A(BC), length 11, ci 63, mi 47. 


SIGNIFICANCE OF EXAMPLES 1-6 


These six examples demonstrate differ- 
ences between component and three-item 
data. Noteworthy is the relative insensitivity 
of component data to differences among 
cladograms of organisms. In Examples 1-5, 
the single informative character of each com- 
ponent matrix is exactly the same, whereas 
the informative character(s) of the three-item 
matrix are generally different. In Examples 
3-5, the entire component matrix is exactly 
the same, whereas the three-item matrix is 
different. In Examples 4 and 5, the compo- 
nent matrix contains no conflict, whereas the 
three-item matrix does, and parsimony anal- 
ysis of the matrices yields different trees. In 
Example 6, both types of matrices contain 
conflict, but the conflict is differently repre- 
sented in the matrices, and parsimony anal- 
ysis of the matrices yields different trees. 

It is pointless to argue that different clado- 
grams (Examples 3-5) are exactly represented 
by the same matrix. Rather, different clado- 
grams imply different matrices for their exact 
representation. From these examples it seems 
that three-item data are more exact than 
component data. It does not follow, however, 
that geographic results of parsimony analysis 
of the three-item matrices are, therefore, gen- 
erally better than results of parsimony anal- 
ysis of the component matrices. It is enough 
to note that three-item data include vari- 
ability that component data sometimes ex- 
clude. In what follows, the cause and signif- 
icance of this variability are considered in 
detail. 


AUSTRAL MIDGES (Brundin, 1966) 


A publication significant in the history of 
cladistic systematics and biogeography is that 
of Brundin (1966). His monograph of species 
of certain groups of midges of the Southern 
Hemisphere contains a general discussion of 
cladistic principles, their application to 
midges, and conclusions about the history in 
time and space of these organisms. His 
monograph began the cladistic approach to 
biogeography. More than any previous pub- 
lication, it influenced subsequent develop- 
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LAURASIA 
Lasiodiamesa i 3 ° 


Trichotanypus Paraboreochlus i 
Boreochius 


® 
Parochlus kiefferi ‘6 


AUSTRALIA 


SOUTH AMERICA 


SOUTH 
AFRICA 


NEW ZEALAND 


Rheochtus y Paraphrotenia 


~ 


Aphroteniella 


Podonomopsis 


Aptrotenia 7 


Podochlus 


Podonomus *, 


Parochlus 


Zelandochius 


Boreochlini 


Podonomini 


Subf. Apnroteniinae 


Subt. Podonominae 


Fig.2. Relationships of midges of subfamilies Podonominae and Aphroteniinae (after Brundin, 1966: 
fig. 634, modified from 1965: fig. 2; 1967: fig. 2). 
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LAURASIA 


NEW ZEALAND SOUTH AMERICA 


14 15 16 7 al 19 
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SOUTH 
AFRICA 


Fig. 3. Relationships of midges of subfamily Diamesinae (after Brundin, 1966: fig. 635). 


ment of cladistic systematics and biogeog- 
raphy. 

In Brundin’s monograph, there are clado- 
grams of two groups of midges comprising 
three subfamilies and 70 terminal taxa (figs. 
2, 3), each a species or species group, and a 
summary of their geographic relationships 
(fig. 4). Brundin later (1970: fig. 1; 1972: fig. 
4; 1975: fig. 1; 1988: fig. 11.2) offered a sec- 
ond summary (fig. 5). Still later (1972: figs. 
5-6; 1975: figs. 2-3) he offered a third sum- 
mary (fig. 6). 

In Brundin’s first summary, there are two 
geographic patterns (fig. 7, Pl and P2). In P1, 
southern South America relates to New Zea- 


land, and these two areas relate to southern 
Africa; in P2, southern South America relates 
to southeastern Australia, and these two areas 
relate to southern Africa. If Pl and P2 are 
combined, then three of the four areas (South 
America, New Zealand, Australia) relate 
among themselves, and as a group relate to 
southern Africa (fig. 7, Pl + 2; Nelson and 
Ladiges, 1991b: table 6, Example 2). 
Brundin’s second summary (fig. 7, S2) 
shows that Australia and South America 2 
(node 4) are related more closely than to South 
America 1, these three areas are related (node 
3) more closely than to New Zealand, these 
four areas are related (node 2) more closely 
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Fig. 4. ‘‘Main pattern of transantarctic relationships, as evidenced by chironomid midges. The black 
arcs connect the inferred Mesozoic main nodes of evolution and dispersal of the paleoaustral element. 
The arrows indicate directions of dispersal before the disruption of Gondwanaland” (after Brundin, 
1966: fig. 636). 


AU SA NZ LAUR AFR 
— , 
_— 
E. ANT. Fig. 5. “The connection between phylogenetic 
relationship, relative age, and geographical distri- 


bution in cold-adapted chironomid midges of aus- 
tral origin. Circles with attached arrows indicate 
the multiple occurrence of accordant transantarc- 
tic connections within a monophyletic group. The 
different evolutionary and biogeographical role 
played by East and West Antarctica after the sep- 
aration of South Africa from the other southern 
lands in the Upper Jurassic is also indicated” (after 
Brundin, 1970: fig. 1; 1972: fig. 4; 1975: fig. 1; 
1988: fig. 11.2). 


W. ANTARCTICA 
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“Circum-Antarctic distribution and inferred transantarctic relationships.’ Above: left (sub- 


family Podonominae), “‘A: the tribe Boreoclini; B, the tribe Podonomini. The phylogenetic diagram 
(within the frame of ‘B’) refers to the situation in the genus Podonomus, where the species group of New 
Zealand is plesiomorphic and the species group of Australia is apomorphic in relation to the corresponding 
sister groups in South America”; right (subfamily Diamesinae), ““C: the Diamesae; D: the Heptagyiae. 
The phylogenetic diagram (within the frame of ‘D’) refers to the situation in the tribe Heptagyini, where 
the group of New Zealand (genus Maoridiamesa) and the group of Australia (the tonnoiri group of the 
genus Paraheptagyia) are both apomorphic in relation to the corresponding South American sister group”’ 
(after Brundin, 1972: figs. 5, 6; 1975: figs. 2, 3). Below, portions of combined cladograms (fig. 8) 


represented above. 


than to Africa and Laurasia, which are related 
(node 1) among themselves. Laurasia aside, 
Brundin’s second summary, containing two 
nodes (3 and 4) that relate Australia and South 
America more closely than to New Zealand, 
differs from the combination of the two pat- 
terns. Even so, the second summary might 
be more accurate than the two patterns. 


PARSIMONY ANALYSIS: NODES OF 
COMBINED MIDGE CLADOGRAMS (FIG. 8) 


Brundin’s two cladograms (figs. 2, 3) can 
be combined via a basal node as one clado- 
gram of 69 nodes and 70 terminal taxa (fig. 
8), distributed in Australia(A), Africa (F), 


Laurasia (L), South America (S), and New 
Zealand (Z). Table 1 is the component matrix 
for all 69 nodes, most of which are variably 
redundant (tables 2, 3). Parsimony analysis 
of this matrix yields one tree (fig. 7, Comp), 
similar to Brundin’s second summary (see 
above and fig. 7, 82). 

Three-item analysis of nodes of the com- 
bined cladograms (fig. 8) yields 17,051 state- 
ments. Parsimony analysis of a uniformly 
weighted matrix yields one tree (fig. 7, UW). 
Parsimony analysis of a fractionally weighted 
matrix (< 10) yields one tree (fig. 7, FW x 10). 

Results of parsimony analysis of the ma- 
trices for the nodes of the combined clado- 
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amery Africa 
S America S America 
Pl N Zealand P2 Australia 
1s== Africa 
Africa F Cc Laurasia 
S America 4 N Zealand 
N Zealand 35= § America 1 
Australia Ge S America 2 
P1+2 $2 Australia 
F 1 Africa Africa 
Laurasia Laurasia 
23> N Zealand Australia 
37= § America S America 
Comp [ Australia UW N Zealand 


Laurasia Africa 
Africa Laurasia 
Australia N Zealand 
FW S America CT S America 
x10 N Zealand Compl Australia 
Africa Africa 
Laurasia Laurasia 
N Zealand N Zealand 
CT S America S America 
Comp2 Australia Con2 Australia 


Africa Africa 
Laurasia Laurasia 
Australia Australia 
CT C S America S America 
UW3 N Zealand Con3 N Zealand 


ko tC Africa 
Africa | Laurasia 
Laurasia i N Zealand 1 
Australia L S America 1 
S America G N Zealand 2 
N Zealand cc §S America 2 
CTFWx10 S2A Australia 


Fig. 7. Relationships of austral areas, deter- 
mined by various means. PI and P2, the two pat- 
terns of Brundin’s first summary (fig. 4). Pl + 2, 
combination of patterns Pl and P2. S2, Brundin’s 
second summary (fig. 5). Comp, result (one tree) 


grams differ (fig. 7, Comp, UW, FW~ 10). 
These different results for component and 
three-item data (with uniform and fractional 
weighting) indicate that the data of the com- 
bined cladograms are ambiguous. The source 
of this ambiguity is geographic paralogy. 


GEOGRAPHIC PARALOGY 


Paralogy is a term used by molecular bi- 
ologists to refer to comparison (or the rela- 
tion) between copies of the same gene within 
a genome. Geographic paralogy, analogous 
with the molecular phenomenon (Nelson and 
Ladiges, 1991b: 481; Page, 1993), is dupli- 
cation or overlap in geographic distribution 
among related taxa (hereinafter the term par- 


of parsimony analysis of component matrix (table 
1) for 69 nodes of combined cladograms (fig. 8), 
and also of component and three-item matrices 
(tables 4, 5) for 19 nodes of 16 subtrees (fig. 9); 
for matrix of table 1, length 79, ci 87, ri 84—with 
uninformative characters rendered inactive (char- 
acters 0-3, 11, 21, 31, 41-44, 50-55, 57, 61, 67), 
the reduced matrix yields the same tree, length 59, 
ci 83, ri 84; for matrices of table 4, length 19, ci 
100, ri 100; for uniformly weighted matrices of 
table 5 (44 statements), length 44, ci 100, ri 100; 
for fractionally weighted matrices of table 5, length 
44 x any factor, ci 100, ri 100. UW, result (one 
tree) of parsimony analysis of uniformly weighted 
three-item matrix (17,051 statements) for 52 in- 
formative nodes of combined cladograms (fig. 8), 
length 26441, ci 65, ri 50. FW 10, result (one 
tree) of parsimony analysis of fractionally weight- 
ed (x10) three-item matrix for 52 informative 
nodes of combined cladograms (fig. 8), length 
260722, ci 65, ri SO. CTComp1 and CTComp2?, 
results (two trees) of parsimony analysis of com- 
ponent matrix (table 6) for 23 nodes of combined 
subtrees (fig. 10), length 30, ci 76, ri 72. Con2, 
strict consensus of two trees (CTComp1 and 
CTComp2). CTUW1, CTUW2, CTUW3, results 
(three trees) of parsimony analysis of uniformly 
weighted three-item matrix (1682 statements) for 
22 informative nodes of combined subtrees (fig. 
10), length 2564, ci 65, ri 47. Con3, strict consen- 
sus of three trees (CTUW1, CTUW2, CTUW3). 
CTFW «x 10, result (one tree) of parsimony analysis 
of fractionally weighted (x 10) three-item matrix 
for 22 informative nodes of combined subtrees 
(fig. 10), length 24400, ci 66, ri 48. S2A, Brundin’s 
second summary, modified to conform exactly with 
relationships of austral areas as he saw them. 


alogy refers to geographic paralogy). A clado- 
gram node is paralogous when it relates or- 
ganisms with geographic distributions that 
overlap to any degree, and such distributions 
are themselves paralogous (cf. the “redun- 
dant” nodes of Page, 1988: 269, 1994: 65). 
In four of the five examples above (fig. 1), 
there is at least one paralogous node: node 2 
of Example 2, node 0 of Example 3, node 0 
of Example 4, node 0 of Example 5. In Brun- 
din’s combined cladograms (fig. 8), there are 
many (47) paralogous nodes (table 2). 
There are several possible causes of par- 
alogy in the sense of overlapping geographic 
distributions of taxa: tectonics, dispersal, 
sympatric speciation, mistaken relationships 
among organisms, imprecise characteriza- 
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Fig. 8. Combination of Brundin’s two cladograms (figs. 2, 3). Paroclus kiefferi (terminal taxon 16 
with a Laurasian distribution), which Brundin arbitrarily placed as the sister-species of P. maorii of New 
Zealand (cf. Brundin, 1963: fig. 4), is listed as a New Zealand taxon; elsewhere (Brundin, 1965: fig. 2; 
1967: fig. 2), P. kiefferi is shown as sister to a group including terminal taxa 17-19, and still elsewhere 
Brundin (1976: 148) states that P. kiefferi, once seen with “‘a clearly secondary occurrence in the Hol- 


arctic,” is now part of a “changed .. 


. picture” (cf. Brundin, 1981: 117—118). Terminal taxa 1-47 are as 


in figure 2 (40-47 here are 38-45 in fig. 2). Terminal taxa 48-70 are as in figure 3 (1-23 in fig. 3). 


tion of geographic areas, and so on. It is ev- 
ident, however, that in cladograms of many 
taxa paralogy generally increases toward the 
base of the cladogram, such that, beyond a 
certain point, all more basal nodes are likely 
to be paralogous. Such is evident in Brundin’s 
combined cladograms (fig. 8; see below). 


SUBTREE ALGORITHM AND COMBINED 
MIDGE CLADOGRAMS (FIG. 9) 


If within cladograms generally, paralogous 
nodes are nonrandomly distributed, then a 


data matrix might be improved if paralogous 
nodes were identified so that no geographic 
data were associated with them. We suggest 
an algorithm for that purpose. The objective 
of the algorithm is to reduce a cladogram, 
such as that of Brundin’s combined clado- 
grams, to one or more subtree that is paralogy 
free. The algorithm begins with a list of ter- 
minal nodes that are nonparalogous and geo- 
graphically informative. The algorithm builds 
subtrees, starting with each terminal node, 
progressing basally node by node, and incor- 
porating nonparalogous geographic data for 
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each node while ignoring any paralogous data 
(as indicated by overlap of geographic areas). 
Of the six examples above (fig. 1, Examples 
1-6), each reduces to one and the same sub- 
tree, A(BC). 

Brundin’s combined cladograms (fig. 8) re- 
duce to 16 subtrees (fig. 9), which together 
include 19 informative nodes. Of the 69 nodes 
of the combined cladograms (fig. 8), 47 are 
paralogous. Three nodes (3-5) are in them- 
selves uninformative, but each of the three 
functions as a basal node of more than one 
subtree (table 2). 


PARSIMONY ANALYSIS OF NODES 
INDIVIDUAL MIDGE SUBTREES (FIG. 9) 


Nodes of individual subtrees may be rep- 
resented in either a component or a three- 
item matrix. For the 19 nodes of the subtrees, 
a component matrix (table 4, left) has 19 
characters. Parsimony analysis of this matrix 
yields one tree, length 19, ci 100, ri 100. The 
same result is obtained for a matrix of 16 
characters (including three multistate char- 
acters), each representing one subtree (table 
4, right). Three-item analysis of the 19 nodes 
yields 44 statements (table 5). Parsimony 
analysis of a uniformly weighted matrix yields 
one tree, length 44, ci 100, ri 100. Parsimony 
analysis of a fractionally weighted matrix (any 
factor) yields one tree, length 44 x any factor, 
ci 100, ri 100. However represented, the data 
yield one and the same tree (fig. 7, Comp)— 
similar to Brundin’s second summary (fig. 7, 
$2). 

For nodes of subtrees, parsimony analysis 
of both types of matrices (component and 
three-item) yields the same tree with 100% 
consistency, suggesting that geographic par- 
alogy is the sole cause of the variable results 
of parsimony analysis of matrices (compo- 
nent and three item) of nodes of combined 
cladograms (see above). 

Parsimony analysis of matrices for the 
nodes of the individual subtrees yields a tree 
(ci 100) the same as those (ci 83-87) yielded 
by parsimony analysis of the component ma- 
trix for nodes of combined cladograms (see 
above), suggesting that in this case compo- 
nent data—in contrast to three-item data— 
reduce the effects of paralogy within com- 
bined cladograms. Such is not a general prop- 
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TABLE 1 
Component Matrix for 69 Nodes of Brundin’s 
Combined Cladograms (fig. 8) 
Nodes of Combined Cladograms 
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4 Symbols: OG, Outgroup; A, Australia; F, Africa; L, 
Laurasia; S, South America; Z, New Zealand. 


erty of component data, even though com- 
ponent data see no difference between differ- 
ent trees (Examples 3-5), or between nodes 
with different redundancy (tables 2, 3). Blind 
to some aspects of variation in tree structure, 
component data are perforce blind to some 
paralogy. 

Consider again Example 6 (fig. 1). All five 
nodes have component data. Node 0 is par- 
alogous. Its component data, like those of all 
paralogous nodes, are redundant: A + A + 
A+B+C+C+4C, reducing to ABC. 
Eliminating redundancy, however, does not 
eliminate paralogy from the component ma- 
trix. All comparisons between nodes 1-3 in- 
volve node QO and are paralogous. Without 
paralogy, there are no data for nodes 1-3. 
Subtree analysis sees node 0 as paralogous 
and nodes 1~3 as without data and yields a 
single subtree, A1l(BC1). 

Example 6 shows weakness of component 
data of paralogous nodes. The cladogram in- 
cludes conflict, variously captured by com- 
ponent and three-item data. Parsimony anal- 
ysis of both types of matrices yields different 
trees with less than 100% consistency. For 
the component matrix, the result, determined 
by paralogy, differs from that of subtree anal- 
ysIs. 


COMBINED MIDGE SUBTREES (FIG. 10) 


The 16 subtrees can be combined in one 
tree (fig. 10), which contains all 19 nonpar- 
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Component Data for Nodes (0-68) of Brundin’s Combined Cladograms (fig. 8) and Subtrees (fig. 9) 


Node 


OMmnAIAMN RR WN © 


Taxa 


1-70 

1-47 
48-70 

1-41 
42-47 
48-67 
68-70 

1-37 
38-41 
43-47 
48-66 
69-70 

1-25 
26-37 
39-41 
43-44 
45-47 
48-65 

1-24 
26-29 
30-37 


Areas 


AFLSZ+ 
AFLSZ+ 
AFLSZ+ 
AFLSZ+ 
AFS+ 
ALSZ+ 
FL+ 
ASZ+ 
FL+ 


Combined 
cladograms 


(AFLSZ)*& 
(AFLSZ)* 
(AFLSZ)* 
(AFLSZ)* 
LZ(AFS) 
F(ALSZ) 
ALS(FL) 
FL(ASZ) 
ASZ(FL) 
FLZ(AS) 
FL(ASZ) 
AFSZ(L)*& 
FL(ASZ) 
FL(ASZ) 
ASZ(FL) 
FLZ(AS) 
FLZ(AS) 
FL(ASZ) 
FL(ASZ) 
FL(ASZ) 
FLZ(AS) 
ALSZ(F)*& 
FLZ(AS) 
FLZ(AS) 
AFL(SZ) 
FL(ASZ) 
FL({ASZ) 
FLZ(AS) 
FLZ(AS) 
FLZ(AS) 
FLZ(AS) 
AFLZ(S)*& 
AFL(SZ) 
FLZ(AS) 
FL(ASZ) 
FLZ(AS) 
FLZ(AS) 
FLZ(AS) 
FLZ(AS) 
FLZ(AS) 
FLZ(AS) 
AFLZ(S)*& 
AFLZ(S)*& 
AFLZ(S)*& 
AFLS(Z)*& 
FLZ(AS) 
AFL(SZ) 
FL(ASZ) 
FLZ(AS) 
FLZ(AS) 
FLSZ(A)*& 
AFLZ(S)*& 
AFLZ(S)*& 


Subtrees 


Subtree 
number 
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TABLE 2—(Continued ) 


Node Taxa Areas 


53 61-62 + 
34 63-65 Z+ 
55 2-3 S+ 
36 4-5 AS 
37 7-8 S+ 
58 9-10 SZ 

59 11-19 ASZ+ 
60 20-21 SZ 

61 63-64 i+ 
62 12-19 ASZ+ 
63 12-15 SZ+ 
64 16-19 ASZ+ 
65 12-13 SZ 

66 14-15 SZ 

67 16-17 Z+ 
68 18-19 AS 


Combined Subtree 

cladograms Subtrees number 
AFLS(Z)*& p — 
AFLS(Z)*& p = 
AFLZ(S)*& p _ 
FLZ(AS) FLZ(AS) 1 
AFLZ(S)*& p — 
AFL{SZ) FL(SZ) 2 
FL{ASZ) p os 
AFL(SZ) FL(SZ) 6 
AFLS(Z)*¥& p — 
FL(ASZ) Pp = 
AFL(SZ) p _ 
FL(ASZ) FL(ASZ) 5 
AFL(SZ) FL{(SZ) 3 
AFL(SZ) FL(SZ) 4 
AFLS(Z)*& p _ 
FLZ(AS) FLZ(AS) 5 
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4 Symbols: +, multiple occurrence in one or more area (see table 3); *, node geographically uninformative (com- 
ponent data); &, node uninformative (three-item data); p, paralogous node; and —, node unrepresented in any subtree. 


Other symbols as in table 1. 


alogous nodes—the paralogy-free fraction of 
the geographic data of the combined clado- 
grams (fig. 8). The combined subtrees (fig. 10) 
include also the three basal nodes (which 
thereby become paralogous), as well as one 
paralogous (and uninformative) node basal 
to the entire tree. The 19 informative nodes 
of the subtrees are in distal positions in the 
combined cladograms (fig. 11), in accord with 
the expectation that paralogy increases ba- 
sally in cladograms (see above). 

Component data for the nodes of the com- 
bined subtrees produce a matrix (table 6) 
similar to that of table 4 (left). There are four 
additional characters for nodes 0, 3, 4, and 
5, and missing-data entries (question marks) 
are replaced by 0-entries resulting from par- 
alogous comparisons. Parsimony analysis of 
the component matrix (table 6) yields two 
trees (fig. 7, CTComp1 and CTComp?, with 
strict consensus, Con2). Three-item analysis 
of the nodes of the combined subtrees (fig. 
10) yields 1682 statements. Parsimony anal- 
ysis of a uniformly weighted matrix yields 
three trees (fig. 7, CTUW1, CTUW2, 
CTUW3, with strict consensus, Con3). Par- 
simony analysis of a fractionally weighted 
matrix (x10) yields one tree (fig. 7, 
CTFW x10). Results of these analyses of 


nodes differ (fig. 7, CTCompl, CTComp?2, 
Con2, CTUW1, CTUW2, CTUW3, Con3, 
CTFWx10). Such are the ambiguous effects 
of paralogy, as variously captured by com- 
ponent and three-item data for nodes of com- 
bined subtrees. Individual subtrees are par- 
alogy free, but their combination (fig. 10) in- 
troduces paralogy at the connecting nodes 
(nodes 0, 3, 4, 5). 


CONCLUSIONS 


Some cladogram nodes are geographically 
paralogous. Parsimony analysis of a matrix 
with characters for paralogous nodes can yield 
a tree different from that obtained by parsi- 
mony analysis of a matrix for nodes of in- 
dividual subtrees (the paralogy-free fraction 
of data). With either type of data (compo- 
nents, three items), the effects of paralogy are 
unpredictable. We suggest that the paralogy- 
free fraction contains the only data relevant 
to area relationship in the cladistic sense. 
Subtree analysis and parsimony analysis of a 
matrix for nodes of individual (not com- 
bined) subtrees seem the only exact methods 
presently known to capture and to analyze 
these data. 
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TABLE 3 
Analysis of Redundancy Eliminated from Com- 
ponent Data for Nodes of Brundin’s Combined 
Cladograms (fig. 8)? 


Areas 
Component 
Data Node A F L S Z Total 
(AFLSZ) 0 11 4 5 34 16 £470 
1 9 3 2 #23 #10 = £47 
2 2 l 3 11 6 23 
3 7 2 2 #20 10 = «4i1 
LZ(AFS) 4 | 0 3 0 6 
F(ALSZ) 4 2 O l 11 6 20 
ALS(FL) 6 0 61 2 0 0 3 
FL(ASZ) 7 7 O O 20 10 £37 
10 2 0 0 11 6 19 
12 3 O OO 13 9 25 
13 4 0 0 7 1 12 
17 2 O 0 11 5 18 
18 3 O O 13 8 24 
19 1 0O 0 2 1 4 
25 1 Oo 0 4 1 6 
26 2 0 O 9 7 18 
34 1 0 0O 7 7 15 
47 1 O O 4 6 11 
59 1 0 0 3 5 9 
62 1 0 0 3 4 8 
64 1 O 0 l 2 4 
ASZ(FL) 8 oO. 2 <2 0 0 4 
14 0 2 1 0 0 3 
FLZ(AS) 9 2 0O QO 3 0 5 
15* 1 O QO 1 0 2 
16 1 oO 90 2 0 3 
20 3 O O 5 0 8 
22” 1 0 90 1 0 2 
23 2 O QO 6 0 8 
27 1 oO 0 2 0 3 
28 1 oO 0 2 0 3 
29 2 0 QO 3 0 5 
30 2 0 QO 5 0 7 
33 1 oO 90 4 0 5 
35 1 0 0 2 0 3 
36* 1 0 0 l 0 2 
3% 1 0 0 1 0 2 
38* 1 0 0 1 0 2 
39 1 0 O 2 0 3 
40 2 0 O 2 0 4 
45 1 O O 3 0 4 
48* 1 0O 0 1 0 2 
49* 1 0 O 1 0 2 
56* 1 oO QO ] 0 2 
68* 1 O 0 1 0 2 
AFSZL) 11 0 0 2 0 0 Ze 
ALSZ(F) 21 0 2 O 0 0 2 
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TABLE 3—(Continued ) 


Component Areas 
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4 Each node leads to the indicated number of areas: 
A (Australia), F (Africa), L (Laurasia), S (South Amer- 
ica), and Z (New Zealand). Asterisk indicates nodes with 
no redundancy: 15, 22, 36-38, 48, 49, 56, 58, 60, 65, 
66, and 68. 
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INTERPRETATION OF NODES 


For Brundin, his first and second sum- 
maries are different ways to depict the same 
interpretation (also Brundin, 1974: 294-295). 
There are two patterns: one older, associated 
with West Antarctica and New Zealand (fig. 
7, Pl) and the other younger, associated with 
East Antarctica and Australia (fig. 7, P2). In 
Brundin’s second summary (fig. 7, $2), node 
2 relates New Zealand and West Antarctica 
and node 4 relates Australia and East Antar- 
tica (Brundin, 1966: 452; 1993: 363-364): 


the connections between the southern lands have been 
broken according to a certain sequence beginning with 
the separation of Southern Africa [node 0]. The next 
event was the break in the connections between New 
Zealand and (West) Antarctica [node 2]. The follow- 
ing separation between Tasmania-Australia and (East) 
Antarctica [node 4] antedates, probably quite consid- 
erably, the break between South America and Ant- 
arctica. 
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Fig. 9. Sixteen subtrees derived from Brundin’s combined cladograms (fig. 8). If terminal taxon 16 
(P. kiefferi) is considered the sister of terminal taxon 17, then Subtree 5 alters to (LZ)(AS), which conflicts 
with other Subtrees; if terminal taxon 16 is considered sister of terminal taxa 17-19, then Subtree 5 


alters to L(Z(AS)), which does not conflict (cf. fig 8). 


Curiously, Brundin associates no event with 
node 3, which proves superfluous (no cladis- 
tic method sees South America 1 and South 
America 2 as different). However, Brundin’s 
second summary is an attempt not only to 
combine both patterns, but also to show a 
complex pattern of relationship for New Zea- 
land (Brundin, 1970: 46): 


We see from this [“‘simple diagram,” i.e., second sum- 
mary, but see below] that a group in New Zealand is 
the sister group of a group occurring in South America 
[see below; fig. 7, S2A, node 3], or in South America 
+ Australia {see below; fig. 7, S2A, node 4]. 


In the second summary, there is no node 
showing “that a group in New Zealand is the 
sister group of a group occurring in South 
America”’ and not in Australia. Presumably 
Brundin saw node 3 playing this symplesiom- 
orphic role, associating New Zealand and 
South America 1 (whereas node 3 relates 
South America | and Australia more closely 


than to New Zealand). An exact depiction of 
Brundin’s interpretation is more complex (fig. 
7, S2A), as suggested by his third summary 
(fig. 6). No known cladistic method would 
yield this tree (fig. 7, S2A, or the second sum- 


TABLE 4 
Component Matrices for Subtrees Derived from 
Brundin’s Combined Cladograms? 


Nodes of Subtrees Subtrees 
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* Left (two-state characters only), matrix for 19 infor- 
mative nodes of 16 subtrees (fig. 9). Right (two-state and 
multistate characters), matrix for 16 subtrees (fig. 9). 
Symbols as in table 1. 
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TABLE 5 
Three-Item Matrices for Subtrees Derived from 
Brundin’s Combined Cladograms? 


Nodes of Subtrees 
1111321111 2 2222 3 333 33 33 4 
4445 9999 25555 2 666 77 88 0 
0G000 0 0000 0 0000 0 000 00 00 0 
A 07? 11177 11177 ? 111 11 11~41 
F 111 0070? 0070? ? 07? 07? 0? ? 
L 1117 7070 ? 27070 070? ?0 70 0 
$§ 707? 17711 129711 1 £11 114121~1 
Z 7270 7 1111 7 111117970 97? «77 7 

4444555 55 66 6666 66 66 666 

88 99 666 88 00 4444 55 66 888 
0G00 00 000 00 00 0000 00 00 000 
A 1111111 797? 7?7 11797 9979 27 111 
F 0? 0? 077 0? 07 070? 0? 0? 07? 
L 70 70 70? ?0 70 7070 270 7?0 720? 
§$ 1111111 11 11 79711 2111111 
Z 72? 797 7970 1111 «2111 1111770 

Subtrees 
0000000 00 00 00 0000000 00 
11111112233 44 §555555 66 

0G0000000 00 00 00 0000000 00 
A 1111177 27? 797? 7277 1111177 #?? 
F 0770707 07? 0? 0? 077070? 0? 
L 7077070 70 270 70 27077070 270 
§ 1117711 11 1111 11177112121 
Z ?701111 1111211 77011121«2121 


77 
0OG00 0000000 00 00 00 000 0000 
ACL YT PPE EL? eld Ad lacO?R? LIT? 
F 07 0770707 0? 0? 07 111 007? 
L ?0 7077070 70 270 70 111 7700 
STs Saeed ee ea 23002 STL 
Pie at Say Zi ak Wg is WR 2 Se se a a 8 es es dca | 


2 Above, matrix for 19 nodes of 16 subtrees (fig. 9). 
Below, matrix for 16 subtrees (fig. 9). These equivalent 
matrices are not derived, nor derivable, from the com- 
ponent matrix of table 4 (left), but from the subtrees of 
figure 9. The matrix for subtrees (below) is derivable also 
from the component matrix of table 4 (right). Symbols 
as in table 1. 


mary) from analysis of Brundin’s cladograms 
(nor is there reason to suggest that he used 
such to develop his second and third sum- 
maries). His cladograms (figs. 2, 3) do not 
differentiate two sorts of areas for South 
America and for New Zealand, as indicated 
by this tree (or two sorts of areas for South 
America, as indicated in his second sum- 
mary). Without such differentiation, the best 
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Fig. 10. Combination of 16 subtrees (fig. 9), 
including the paralogy-free fraction of Brundin’s 
combined cladograms (fig. 8). 


that could be expected is the combination of 
the two patterns of the first summary (fig. 7, 
Pl + 2), or the tree obtained from subtree 
analysis (fig. 7, Comp). Similar to and con- 
firming Brundin’s second summary (less the 
superfluous node 3 of the second summary), 
the tree obtained from subtree analysis (fig. 
7, Comp) offers the same two nodes relevant 
to Brundin’s interpretation of West Antarc- 
tica (node 2) and East Antarctica (node 3). 

Brundin (1966: 451) emphasized that for 
New Zealand midges there are “‘no direct re- 
lationships across the comparatively narrow 
Tasman Sea.” For other groups of organisms 
such geographic relationship is common- 
place. Current models of the geological evo- 
lution of New Zealand see the modern con- 
dition as derived from collision tectonics me- 
diated by two zones of spreading: an older, 
extinct Tasman ridge and in the Southern 
Ocean the currently active ridge separating 
Antarctica from Australia and New Zealand 
and continuing northward along the western 
margin of South America. 
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Fig. 11. Distribution of informative (@) and basal (b) nodes of subtrees (fig. 9) within Brundin’s 


combined cladograms (fig. 8). 


An alternative to Brundin’s interpretation 
of the above nodes (2 and 3) is that they relate 
to the two spreading ridges: node 2 relates to 
the older Tasman ridge, and node 3 relates 
to the younger and still active ridge of the 
Southern Ocean. 

For Brundin, southern Africa was isolated 
early, before differentiation among midges of 
southern South America, Australia, and New 
Zealand. Subsequently, New Zealand evi- 
dently was isolated early, before differentia- 
tion among midges of southern South Amer- 
ica and Australia. That there is no trace of 
“direct relationships” between midges of 
Australia and New Zealand is no more re- 
markable than their lack between midges of 
Africa and South America. 


TABLE 6 


Component Matrix for Paralogy-Free Fraction of 
Midge Data? 


Nodes of Combined Subtrees 
2233334445 
2d 


et ee tl oe oe eo eS 

=e OO = © oO —_ 
fo.) 
~] 
oo 
o 
oo 
Oo 
nN 
CO WA 


Ooroocdr& 
=e OOH © 


* Matrix for 23 nodes of combined subtrees (fig. 10), 
containing paralogy-free fraction of midge data. Symbols 
as in table 1. 
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Fig. 12. Cladograms of midges and five other groups of organisms (modified from Craw, 1989: fig. 
5): chironomid midges (after Brundin, 1981); land snails of the family Bulimulidae (after Breure, 1979); 
ratite birds (after Mivart, 1877; Sibley and Ahiquist, 1987); southern cedars of the family Cupressaceae 
(after Hart, 1987); Nothofagus (after Humphries et al., 1986); Oreobolus (after Seberg, 1988). 


MIDGES AND OTHER ORGANISMS 
(fig. 12) 


Craw (1989) combined some of Brundin’s 
data and those for five other groups of plants 
and animals (fig. 12) occurring in the same 
areas (Australia, New Zealand, South Amer- 
ica, Africa) as well as in New Caledonia (C) 
and New Guinea (G). Treating the six clado- 
grams in effect as individual subtrees, he rep- 
resented most of their nodes (27 of 29) by 
component data. With parsimony analysis of 
the component matrix (table 7, above), he 
found three trees (fig. 13, C1, C2, C3). He 
assessed significance through analysis of con- 
gruence of the three trees (legend to his fig. 
6) with a “geological area cladogram based 
on the conventional breakup sequence of 
Gondwana” (fig. 13, CGC). “A set of 999 
randomly generated cladograms for the six 
areas were calculated under both the equi- 
probable and markovian models and com- 
pared for the distances between them and the 
geological area cladogram”’ (Craw, 1989: 532). 
He found that “‘congruence between these two 
biological area cladograms [trees C2 and C3 
but not tree Cl] and the geological tree was 
just significant at the 5% level” (p. 533). 


PARSIMONY ANALYSIS OF NODES 


COMBINED CLADOGRAMS (FIG. 14) 


The organisms of Craw’s six cladograms 
(fig. 12) relate among themselves (fig. 14) as 


animals (node 1), seed plants (node 2), and 
angiosperms (node 7). Parsimony analysis of 
a component matrix for nodes (33) of the 
combined cladograms (table 7, below) yields 
one tree (fig. 13, Comp). Three-item analysis 
of the nodes of the combined cladograms (fig. 
14) yields 3908 statements. Parsimony anal- 
ysis of a uniformly weighted matrix yields 
two trees (fig. 13, UW1 and UW2). The two 
trees yield a strict consensus (fig. 13, Con2). 
Parsimony analysis of a fractionally weighted 
matrix ( < 10) yields one tree (fig. 13, FW x 10). 

Again, results of parsimony analysis of ma- 
trices for nodes of combined cladograms dif- 
fer (fig. 13, Comp, UW1, UW2, Con2, 
FW x 10), indicating that data are ambiguous 
and that the source ofambiguity is geographic 
paralogy. Within the combined cladograms 
(fig. 14) some nodes are paralogous (nodes 0, 
1, 2, 6, 7, 8, 16, 18, 22). Treating the six 
cladograms as individual subtrees (without 
nodes 0, 1, 2, and 7, and with appropriate 
missing data), Craw eliminates some but not 
all paralogy (nodes 0, 1, and 2 are uninform- 
ative in any case). 


INDIVIDUAL SUBTREES (FIG. 15) 


Craw’s combined cladograms (fig. 14) re- 
duce to 10 subtrees, which together include 
15 informative nodes (fig. 15). Parsimony 
analysis of a component matrix for the 15 
nodes of the individual subtrees (table 8, 
above), yields two trees (fig. 13, C2 and C3), 
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Africa Affi trees (fig. 12), length 40, ci 67, ri 51 (after Craw, 

N Gea FN Caledonia 1989: fig. 6); C2 and C3 are only trees obtained 
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Cl Australia C2 Australia item matrices (table 8) for 15 nodes of 12 subtrees, 
Africa Africa with strict consensus of STCon (see below). CGC, 

a circa ea geological area cladogram (after Craw, 1989: fig. 
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C3 Australia CGC Australia of component matrix (table 7, below) for 33 nodes 
Africa of combined cladograms (fig. 14), length 49, ci 67, 

N Zealand - ri 60. UW1 and UW?, results (two trees) of par- 

N Guinea simony analysis of uniformly weighted three-item 
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reer of combined cladograms (fig. 14), length 8103, ci 
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Fig. 13. Relationships of austral areas, deter- 
mined by various means. Cl, C2 and C3, results 
(three trees) of parsimony analysis of component 
matrix (table 7, above) for 27 of 29 nodes of six 


with strict consensus grouping New Zealand, 
New Caledonia, and New Guinea (fig. 13, 
StCon). Parsimony analysis of uniformly and 
fractionally weighted three-item matrices for 
nodes of the 10 individual subtrees (table 8, 
below) yields the same result (fig. 13, C2 and 
C3). As in the case of Brundin’s combined 
cladograms, informative nodes of subtrees are 
in terminal positions in the 33-node com- 
bination (fig. 16). 


COMBINED SUBTREES (FIG. 17) 


The 10 subtrees combine in one tree (fig. 
17) somewhat simpler (25 versus 33 nodes) 


matrix (4945 statements) for 32 informative nodes 
of combined cladograms (fig. 14), length 69327, ci 
61, mn 36. STCon, strict consensus of two trees 
(same as C2 and C3, above) resulting from par- 
simony analysis of component matrix (table 8, 
above) for 15 nodes of 10 subtrees (fig. 15), length 
21, ci 71, ri 64; also of uniformly weighted three- 
item matrix (table 8, below), length 44, ci 77, ri 
70; also of fractionally (<3) weighted 3-item ma- 
trix, length 126, ci 76, ri 68. CTComp, result (one 
tree) of parsimony analysis of component matrix 
(table 9) for 25 nodes of combined subtrees (fig. 
17), length 41, ci 60, ri 51. CTUW1, CTUW2 and 
CTUWS3, results (three trees) of parsimony anal- 
ysis of uniformly weighted three-item matrix (1671 
statements) for 24 informative nodes of combined 
subtrees (fig. 17), length 2761, ci 61, ri 37. Con3, 
strict consensus of three trees (CTUW1, CTUW2, 
CTUW3). CTFW x 10, result (one tree) of parsi- 
mony analysis of fractionally (x 10) weighted three- 
item matrix (1671 statements) for 24 informative 
nodes of combined subtrees (fig. 17), length 22330, 
ci 62, ri 38. 


than the combination of the original six 
cladograms (fig. 14). Not enough paralogy is 
eliminated from this simpler tree, however, 
for parsimony analysis of a component ma- 
trix for its nodes (table 9) to yield a result 
(fig. 13, CTComp) different from that ob- 
tained above from parsimony analysis of a 
component matrix for the nodes of the 33- 
node combination of six cladograms (fig. 13, 
Comp). Three-item analysis of the nodes of 
the combined subtrees (fig. 17) yields 1671 
statements. Parsimony analysis of a uniform- 
ly weighted matrix yields three trees (fig. 13, 
CTUW1, CTUW2, CTUW3), with strict 
consensus grouping Australia, New Caledo- 
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Fig. 14. Combination of Craw’s (1989) six 
cladograms of midges and other organisms (fig. 
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Fig. 16. Distribution of informative (®) and 
basal (b) nodes of subtrees (fig. 15) within com- 


bination of six cladograms of midges and other 
organisms (fig. 14). 
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Ten subtrees derived from combined cladograms of midges and other organisms (fig. 14). 


1996 
TABLE 7 
Component Matrices for Craw’s (1989) 
Cladograms? 
Nodes of Cladograms 
Midges Snails Birds Cedars Nothofagus Oreobolus 
0234 012 0123 012345 1234567 012 
0G0000 000 0000 000000 0000000 000 
A 1110 1101101 110110 1101100 111 
F 1000 2777 1010 110000 772277777 27? 
§ 1111 110 1010 101100 1111000 111 
Z 1101 101 1100 101001 1110110 110 
C 7777? 101 27777 111011 1010011 22? 
G 727777 101 1101 101000 1010011 100 
Nodes of Combined Cladograms 
000000000011111111112222222222333 
012345678901234567890123456789012 
OGDDDDDDDDGDDDDDDDDDDDNNDDDOND0D000000 
A 111111111101010111111011110011100 
F 1111011000001 10000000000000000000 
S$ 111111111100101111100011111110100 
Z 111111111011001011100111101101010 
C 111010110010011010001110000100011 
G 111011110011001011010010000100011 


# Above, matrix for 27 nodes of Craw’s six cladograms 
(fig. 12, modified from Craw, 1989: table 4). Below, ma- 
trix for 33 nodes of Craw’s combined cladograms (fig. 
14). Craw’s matrix (1989: table 4) omits data for two 
nodes (node 1, Midges cladogram; node 0, Nothofagus 
cladogram). Symbols: C, New Caledonia; G, New Guin- 
ea. Other symbols as in table 1. 


nia, and Africa (fig. 13, Con3). Parsimony 
analysis of a fractionally weighted matrix 
(x10) yields one tree (fig. 13, CTFWx10). 
Results of these analyses differ (fig. 13, 
CTComp, CTUWI1, CTUW2, CTUW3, 
Con3, CTFW x 10). Such, again, are the am- 
biguous effects of paralogy, as variously cap- 
tured by component and three-item data for 
the nodes of the combined subtrees. 


SIGNIFICANCE OF OTHER ORGANISMS 


With addition of groups of organisms oc- 
curring in New Caledonia and New Guinea, 
results of parsimony analysis of the enlarged 
matrix for nodes of subtrees do not contradict 
the results of analysis of the matrix for the 
nodes of subtrees of midges, but indicate 
merely that the added areas (New Caledonia 
and New Guinea) relate to New Zealand. For 
Craw (1989: 533), however, his results supply 
evidence for “‘three different views of the bio- 
geographic classification of New Zealand in 
relation to Australia and South America’’ (the 
other two views are based on analyses not of 
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TABLE 8 
Component and Three-Item Matrices for Subtrees 
Derived from Craw’s Cladograms* 


Nodes of Subtrees 
0000000 00000011 
1233 444 6788 9 00 
OGO0 000 000 00000 0 00 
A 1?103110431%?7? 177 «1«21421 
F 0077 0010777? 7? 2? 
S 11100017 0100011 
Z? 10110027? 17 «101 «10 
Cr ? OF 77:7 1 10 11 =2- 272 
G ? ? 01110? 07? 11? 00 
Subtrees 

00 000000000 0000000 

12333333333 4444444 
OGO 0 000000000 0000000 
A 1? 1110702720? 112727111 
F 00777777777 070702? 
§S 11111707070 2702070? 
Z? 1077111177? 1111770 
Cc 7? 7? 2707117711 7999777? 
Goat 2 PP OTALIIDA 22111511 

000 000 00000 0 1111 

444 5 66 7 8888 9 0000 
0G000 0 00 0 0000 0 0000 
Ang OP) Teed LT steko Pcs 421 
FD tet Oe) 2 2h Pe 2 
S 111 7 0? 1000? 07111 
Z?0? 2? 117? 1170 12110? 
Cole) etl, Oe Tal 9 
G ?70 ? 70 ? 27111 7? 0070 


2 Above, component matrix for 15 informative nodes 
of 10 subtrees (fig. 15). Below, three-item matrix for 10 
subtrees (fig. 15). Symbols as in tables 1 and 7. 


nodes of cladograms but of presence/absence 
data of genera and families of plants and an- 
imals). 

Unlike the midge data, nevertheless, there 
is conflict in the data added, not only re- 
garding relationships of the added areas and 
New Zealand but also regarding relationships 
of New Zealand and South America on the 
one hand and relationships of Australia and 
New Guinea on the other (fig. 12): ratite birds 
relate South America and Africa more closely 
than to New Zealand; cedars relate New Zea- 
land and New Caledonia more closely than 
to New Guinea, whereas Nothofagus relates 
New Guinea and New Caledonia more close- 
ly than to New Zealand, and Australia and 
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Fig. 17. Combination of 10 subtrees (fig. 15) 
that includes the paralogy-free fraction of com- 
bined cladograms of midges and other organisms 
(fig. 14). 


New Zealand more closely than to South 
America; Oreobolus relates New Zealand 
more closely to Australia and South America 
than to New Guinea. Such conflict, evident 


TABLE 9 
Component Matrix for Paralogy-Free Fraction of 
Craw’s Data? 
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* Matrix for 25 nodes of combined subtrees, contain- 
ing paralogy-free fraction of data for midges and other 
organisms (fig. 17). Symbols as in tables 1 and 7. 
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Li és= 16 tanasi (33) 
°L 17 uranidea (2, 5-9, 19) 
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Lo0-22—— 19 variatum (20-23, 25-27) 
[ £2 0 kanawhae/osburni (24) 
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C22 euzonum (5-8) 
23 trisella (34) 
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Fig. 18. Combination of cladograms for seven 
species groups of North American fishes (after 
Mayden, 1988). Cladograms (nodes): 4, Nocomis 
biguttatus group; 5, Fundulus catenatus group; 7, 
Notropis leuciodus group; 8, Luxilus zonatus group; 
11, subgenus Jmostoma of Percina, 17, Etheo- 
stoma variatum group; 18, subgenus Ozarkia of 
Etheostoma. Other nodes: 0, Teleostei; 1, Cyprin- 
idae; 2, Neoteleostei; 3, Notropis (old usage); 6, 
Percidae; 12, Etheostoma. 


without parsimony analysis of any matrix, 
accords with Craw’s views about the com- 
posite nature of New Zealand. The amount 
of conflict is reflected in the low consistency 
indices (74—77) of the various analyses of ma- 
trices for nodes of subtrees that represent the 
paralogy-free fraction of the data. 


NORTH AMERICAN 
FRESHWATER FISHES 
(Mayden, 1988) 


PARSIMONY ANALYSIS OF NODES 


COMBINED FISH CLADOGRAMS 
(FIGs. 18, 19 LEFT) 


In the most ambitious undertaking of its 
kind, Mayden (1988) offered a geographic 
analysis of cladograms for seven species 
groups of fishes distributed among 34 major 
rivers of the central United States. Clado- 
grams for the seven species groups can be 
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1 -=29G 

= 28 Salt (Kentucky) 
= 27 Kentucky 
26 Licking 


18 Big Muddy 
17 Kaskasia 


1 

15 Upper Mississippi 
14 Des Moines 

13 Salt (Missouri) 

9 St Francis 

1 Kiamichi 


= 33 Upper Tennessee 


L 32 Lower Tennessee 
: 33 Upper Tennessee 


25 Big Sandy 
24 Upper Kanawha 
23 Lower Kanawha 


= 19 Wabash 
= 18 Big Muddy 
17 Kaskasia 


r== 16 Illinois 
= 15 Upper Mississippi 
14 Des Moines 
13 Salt (Missouri) 
12 Meramec 


Fig. 19. Strict consensus trees. Left (10 nodes), of 89 trees (length 72, ci 38, ri 82), from parsimony 
analysis of component matrix (table 10, no assumption 2) for 28 nodes of combined cladograms (fig. 
18). Right (6 nodes), of 58 trees (length 75, ci 37, ri 80) from parsimony analysis of component matrix 
(table 10, assumption 2) for 28 nodes of combined cladograms (fig. 22). 


combined as one cladogram of 29 terminal 
taxa and 28 nodes (fig. 18). Of these, seven 
nodes are paralogous (nodes 0-3, 6, 12, 24). 
Table 10 is a component matrix for all 28 
nodes. Parsimony analysis of this matrix (no 
assumption 2) yields 89 trees, of which the 
strict consensus has 10 informative nodes (fig. 
19, left). 


SUBTREE ANALYSIS OF COMBINED FISH 
CLADOGRAMS (FIG. 20) 


Mayden’s combined cladograms (fig. 18) 
reduce to eight subtrees (fig. 20), which to- 
gether include 15 informative nodes. Sub- 
trees 1-6 are no different from the clado- 
grams of six of Mayden’s seven species groups. 
Subtrees 7 and 8 represent the paralogy-free 
fractions of the cladogram for the remaining 
(seventh) species group, which contains one 
geographically paralogous node (fig. 18, node 
24)—the only paralogous node in Mayden’s 
seven cladograms, if each of them is consid- 
ered separately from the others (as an indi- 
vidual subtree). 


INDIVIDUAL FISH SUBTREES (FIG. 21) 


For the 15 nodes of the individual subtrees 
(fig. 20), parsimony analysis of a component 
matrix (table 11) yields 5302+ (overflow) 
trees (length 16), of which the strict consensus 
contains two informative nodes (areas 1-33; 
areas 20-27). Among the 5302 trees, the de- 
gree of resolution (number of informative 
nodes) ranges from 7 to 24 (table 12). One 
tree (fig. 21, left) is least resolved (seven 
nodes). This tree has one spurious node, and 
some areas are placed higher in the tree than 
is warranted by the data. This tree (length 16) 
collapses to yield a tree of least resolution, 
also of length 16 (fig. 21, right). This minimal 
tree (six nodes) is offered as an exact result 
of parsimony analysis of the matrix for the 
paralogy-free fraction of data of all nodes of 
the seven cladograms of Mayden, as rendered 
by subtree analysis (fig. 20 and table 11). This 
minimal tree is nearly the same as that pre- 
viously obtained by hand resolution for as- 
sumption 2 (Nelson and Ladiges, 199 1a: fig. 
3B; see below). 
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TABLE 10 
Component Matrix for Mayden’s (1988) Combined Cladograms* 


Nodes of 


0000000000 
Combined Cladograms: 0123456789 


OUTGROUP 
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26 LICKING 

27 KENTUCKY 

28 SALT (KENTUCKY) 
29 GREEN 

30 CUMBERLAND 

31 DUCK 
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2 Matrix for 28 nodes of combined cladograms for seven species groups of North 
American freshwater fishes (fig. 18), no assumption 2. For assumption 2 (fig. 22), 
certain l-entries become 0-entries: area 03, node 26; area 04, nodes 8, 15, 19; area 
32, nodes 5, 10, 13, 18, 21; area 33, nodes 8, 14. 


Three-item analysis of the 15 nodes of in- 
dividual subtrees yields 971 statements. Par- 
simony analysis of a uniformly weighted ma- 
trix yields 1757+ (overflow) trees, length 993, 
ci 97, ri 97, of which the strict consensus has 
one informative node (areas 1-33). Of the 
1757 trees, least resolved are 14 trees (24 
nodes, range 24-32). Of the 14 trees, one is 
least informative (2844 independent state- 
ments, range 2844—3068). This tree collapses 
to the same minimal tree, length 993 (fig. 21, 
right). Parsimony analysis of a fractionally 


weighted (x 10) matrix yields 1758+ (over- 
flow) trees, length 4620, ci 95, ri 95, of which 
the strict consensus has one informative node 
(areas 1-33). Of the 1758 trees, one is least 
resolved (22 nodes, range 22-32). This tree 
collapses to the same minimal tree, length 
4620 (fig. 21, right). 


RATIONALE OF ASSUMPTION 2 


As noted previously (Nelson and Ladiges, 
1991la: 48), conflicting elements among the 
seven cladograms are removed by applica- 
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tion of assumption 2, which reduces the geo- 
graphic distribution of six species (fig. 22: 
species 2, 5, 8, 13, 24, 28). 


SUBGENUS OZARKIA OF ETHEOSTOMA 
(FIGS. 18, 22: NODE 18) 


There is conflict with respect to area 3 (fig. 
18): node 27 relates areas 3 and 5-8; node 26 
relates areas 3—4 and 2. For node 27, E. punc- 
tulatum A (species 27) is endemic in area 3; 
for node 26, E. cragini (species 28) is wide- 
spread in areas 3-4. Under assumption 2, E. 
cragini is seen as possibly endemic to area 4, 
with secondary dispersal (mobilism) into area 
3. Application of assumption 2 reduces the 
distribution of E. cragini to area 4 (the re- 
lationship of area 3 is determined by the en- 
demic, E. punctulatum A, of which the dis- 
tribution is unreducible). 


LUXILUS ZONATUS SPECIES-GROUP 
(FIGS. 18, 22: NODE 8) 


There is conflict with respect to area 4 (fig. 
18): node 19 relates areas 1, 3—4 and 5; node 
26 (fig. 22) relates areas 2 and 4 (note that 
area 3 is removed from the distribution of 
E. cragini, see above). For node 19, L. pils- 
bryi (species 7) is endemic in area 5, and L. 
cardinalis (species 8) is widespread in areas 
1, 3-4; for node 26, E. cragini (species 28) is 
possibly endemic in area 4 (see above), and 
E. pallididorsum (species 29) is endemic in 
area 2. Under assumption 2, L. cardinalis is 
seen as possibly endemic to areas 1 and 3, 
with secondary dispersal (mobilism) into area 
4. Application of assumption 2 reduces the 
distribution of L. cardinalis to areas 1 and 3 
(the relationship of area 4 is determined by 
the possible endemic, E. cragini, of which 
the distribution is not further reducible). 

There is conflict with respect to area 33 
(fig. 18): node 14 relates areas 32—33 and 34; 
node 16 relates areas 2, 5-9, 19, and 33. For 
node 14, L. zonistius (species 4) is endemic 
in area 34, and L. coccogenis (species 5) is 
widespread in areas 32-33; for node 16, P. 
tanasi (species 16) is endemic in area 33. Un- 
der assumption 2, L. coccogenis is seen as 
possibly endemic in area 32, with secondary 
dispersal (mobilism) into area 33. Applica- 
tion of assumption 2 reduces the distribution 
of L. coccogenis to area 32 (the relationship 

of area 33 is determined by the endemic, P. 
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Subtree 1 


-=7 tL: 1 chrosomus (34) 
2 ucts (29-33) 
°C 3 mubilus (1-12, 14-15) 


Subtree 2 


14 4 zonistius (34) 
ait 5 coccogenis (32-33) 
a) Bc 6 zonatus (6-12) 
. pilsbryi (5) 
8 cardinalis (1, 3-4) 


Subtree 3 


9 29-31 
Cs 10 bigurane (5-22, 27) 
11 asper (1-4) 


Subtree 4 


a 12 stellifer (34) 
107= 13 catenatus east (27-33) 
14 catenatus west (1-3, 5-12, 19) 


Subtree 5 


-115—= 15 antesella (34) 
GQ 16 tanasi (33) 
17 uranidea (2, 5-9, 19) 


Subtree 6 


-179= 18 blennius (31-33) 
Lo0g22,— 19 variatum (20-23, 25-27) 
| E20 0 kanawhae/osburni (24) 
235= rr tetrazonum (10-12) 
Cc 22 euzonum (5-8) 


Subtree 7 


-187— 23 trisella (34) 
bi 24 eee (31-32) 
Los.— 25 punctulatum (10-11) 
1272= 26 punctulatum B (5-8) 
97 punctulatum A (3) 


Subtree 8 


b18>= 23 trisella (34) 
21s 24 boschungi (31-32) 
Loé,— 28 cragini (3-4) 
LC 29 pallididorsum (2) 
Fig. 20. Subtrees derived from combined 
cladograms (fig. 18, no assumption 2). 


tanasi, of which the distribution is unredu- 
cible). 


OTHER OCCURRENCES IN AREA 32 


Under assumption 2, other widespread taxa 
occurring in area 32 are seen as possibly oc- 
curring there because of secondary dispersal 
(mobilism): N. leuciodus (fig. 18: node 13, 
species 2), F. catenatus east (node 10, species 
13), and E. boschungi (node 21, species 24). 
Application of assumption 2 reduces the dis- 
tribution of each of these species (fig. 22) such 
that area 32 is removed therefrom (the re- 
lationship of area 32 is determined by the 
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Current 
Eleven Point 
5 Whi 


rE te 
4 Upper Arkansas 


Results of parsimony analysis of component matrix (table 11, no assumption 2) for 15 nodes 


of subtrees (fig. 20). Left (seven nodes), least resolved (table 12) of 5302+ (overflow) trees (length 16, 
ci 93, ri 97). Right (six nodes), minimal tree (length 16) derived from least resolved (left) tree by collapse 
of nodes and more basal placement of some areas. The same minimal tree results from parsimony 


analysis of three-item matrices (see text). 


possible endemic, L. coccogenis of node 14, 
of which the distribution is not further re- 
ducible). 


PARSIMONY ANALYSIS OF NODES, 
ASSUMPTION 2 
COMBINED FISH CLADOGRAMS 
(FIGS. 22, 19 RIGHT) 


For the combined cladograms (fig. 18, 22), 
application of assumption 2 eliminates par- 
alogy for one node (node 24), leaving six par- 
alogous nodes (nodes 0-3, 6, 12). Parsimony 
analysis of a component matrix for the nodes 
of the combined cladograms (table 10, as- 
sumption 2) yields 58 trees, of which the strict 
consensus has six informative nodes (fig. 19, 
right). Comparison of figure 19 left and right 
shows the effects of application of assump- 
tion 2 on the results of parsimony analysis 
of a component matrix for all nodes of the 
combined cladograms: 


1. overall reduction in resolution (from 10 to 
6 informative nodes); 


2. increased resolution within one grouping 
(areas 20-27) with addition of 1 novel node 
(areas 23-26); 

3. different groupings of the same areas (areas 
2 and 4; areas 32 and 34). 


INDIVIDUAL FISH SUBTREES (FIGS. 23, 24) 


For the 15 nodes of the individual subtrees, 
some affected by assumption 2 (fig. 23), a 
component matrix (table 13) yields 5299+ 
(overflow) trees (length 15), of which the strict 
consensus has two informative nodes (areas 
32, 34; areas 1-31, 33). Among the 5299 trees, 
the degree of resolution (number of infor- 
mative nodes) ranges from 9 to 28 (table 14). 
Four trees are least resolved (nine nodes). 
The strict consensus of these four trees has 
three informative nodes (areas 32, 34; areas 
29-31, 33; areas 1-28). Among these four 
trees, the amount of their data (number of 
independent three-item statements) ranges 
from 1340 to 1508 (table 15). One tree (fig. 
24, left) is least informative (1340 state- 
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TABLE 11 
Component Matrix for Subtrees Derived from 
Mayden’s Combined Cladograms“ 


Nodes of 
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* Matrix for 15 informative nodes of eight subtrees 
(fig. 20), no assumption 2. 


ments). This tree (length 15) collapses to yield 
a tree of least resolution, also of length 15 
(fig. 24, right). This minimal tree (eight nodes) 
is offered as an exact result of parsimony 
analysis of the matrix for the paralogy-free 
fraction of data of all nodes of the seven 
cladograms of Mayden, as rendered by sub- 
tree analysis for assumption 2. This minimal 
tree is the same as that previously obtained 
by hand resolution (Nelson and Ladiges, 
199 1a: fig. 3B). 

Three-item analysis of the 15 nodes of in- 
dividual subtrees, some affected by assump- 
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TABLE 12 
Number of Nodes of 5302 Trees Derived from 
Mayden’s Data? 


Number of Number of 
Nodes Trees Nodes Trees 
24 441 15 677 
23 510 14 323 
22 495 13 179 
21 458 12 116 
20 375 11 55 
19 365 10 48 
18 325 9 39 
17 290 8 11 
16 594 7 1 


2 Resulting from parsimony analysis of component 


matrix for subtree nodes (table 11). 


tion 2 (fig. 23), yields 913 statements. Par- 
simony analysis of a uniformly weighted ma- 
trix yields 1757+ (overflow) trees, length 913, 
ci 100, ri 100, of which the strict consensus 
has two informative nodes (areas 1-31, 33; 
areas 32 and 34). Of the 1757 trees, one tree 
is least resolved (16 nodes, range 16-32). This 
tree collapses to the same minimal tree, length 
913 (fig. 24, right). Parsimony analysis of a 
fractionally weighted (x10) matrix yields 


t chrosomus (34) 
Ti3.= 2 leuciodus (29-31, 33)* 
C3 mudilus (1-12, 14-15) 
Le 4 zonistius (34) 


9 effusus (29-31) 
| weg Cc 10 B bgunan (5-22, 27) 
Ci asper (1-4) 


a Fy 12 stellifer (34) 
107= 13 catenatus east (27-31, 33)* 
CE 14 catenatus west (1-3, 5-12, 19) 
67:115— 15 anteselia (34) 
Lies 16 tanasi (33) 
17 uranidea (2, 5-9, 19) 
129175—= 18 blennius (31-33) 
Lo0-22-— 19 variatum (20-23, 25-27) 
C0 0 kanawhae/osburni (24) 
a A x tetrazonum (10-12) 
22 euzonum (5-8) 
i 23 trisella (34) 
217= 24 boschungi (31)* 
Toa 725 25 Punctulanum (10-11) 
L 26 punctulatum B (5-8) 
"Ln punctulatum A (3) 
cL. oat ini (4)* 
rsum (2) 


Fig. 22. Combination of nae Ok for seven 
species groups (cf. fig. 18). Asterisk (*), species with 
geographic distribution reduced according to as- 
sumption 2. 


30 AMERICAN MUSEUM NOVITATES 


Subtree 1 
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Fig. 23. Subtrees derived from combined 
cladograms (fig. 22). Asterisk (*), species with geo- 
graphic distribution reduced according to as- 
sumption 2 (cf. fig. 22). 


1757+ (overflow) trees, length 4206, ci 100, 
ri 100, of which the strict consensus has two 
informative nodes (areas 1-31, 33; areas 32 
and 34). Of the 1757 trees, one tree is least 
resolved (19 nodes, range 19-32). This tree 
collapses to the same minimal tree, length 
4206 (fig. 24, right). 

Comparison of figures 21 (right) and 24 
(right) shows effects of application of as- 
sumption 2 on results of parsimony analysis 
of matrices (Component and three-item) for 
nodes of individual subtrees: 


1. overall increase in resolution (from six to 
eight informative nodes); 
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TABLE 13 
Component Matrix for Subtrees Derived from 
Mayden’s Combined Cladograms? 
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26 LICKING 727297772717 10272? 
27 KENTUCKY 117777717102227? 
28 SALT (KENTUCKY) ?1777727777727227227 
29 GREEN 0a Rs Ui a! ae ie so i a’ ci sa 
30 CUMBERLAND 0 ey Fae Ue 2 ie al a a a a ae ce 2 
31 DUCK 0117277701000000 
32 LOWER TENNESSEE ??71070070077?7? 
33 UPPER TENNESSEE ?1177170700777? 
34 MOBILE BASIN 700100070770000 


@ Matrix for 15 informative nodes of seven subtrees 
(fig. 23), assumption 2. 


2. increased resolution within one grouping 
(areas 1-27) with addition of one novel 
node (areas 2, 4); 

3. different grouping of the same areas (areas 
32, 34). 


PARSIMONY ANALYSIS OF NODES 
AND SPECIES DISTRIBUTIONS 
COMBINED FISH CLADOGRAMS (FIG. 25) 


In his published matrix, Mayden included 
a binary character describing the distribution 
of each species (table 16, species 1-29). These 
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16 Illinois 

15 Upper Mississippi 
14 Des Moines 

= 13 Salt (Missouri) 

9 St Francis 


"ie 3 Middie Arkansas 
1 Kiamichi 


Fig. 24. Results of parsimony analysis of component matrix (table 13, assumption 2) for 15 nodes 
of subtrees (fig. 23). Left (nine nodes), one of four least resolved (table 14) of 5299+ (overflow) trees 
(length 15, ci 100, ri 100); of the four trees this one is least informative (table 15, 1340 independent 
three-item statements). Right (eight nodes), minimal tree (length 15) derived from least resolved (left) 
tree by collapse of nodes and more basal placement of some areas. The same minimal tree results from 


parsimony analysis of three-item matrices (see text). 


binary characters, and multistate characters 
describing nodes of cladograms of the seven 
species groups (table 20, MS4—18) constitute 
his matrix. The two classes of characters dif- 


TABLE 14 
Number of Nodes of 5299 Trees Derived from 
Mayden’s Data’ 


Number of Number of 
Nodes Trees Nodes Trees 
28 64 18 342 
27 184 17 324 
26 2272 16 247 
25 184 15 264 
24 147 14 124 
23 129 13 51 
22 764 12 37 
21 858 11 32 
20 765 10 19 
19 538 9 4 


@ Resulting from parsimony analysis of component 
matrix for subtree nodes (table 13). See also table 15. 


fer. Those for nodes of cladograms have ?-en- 
tries for missing data; those for species do 
not. Missing data reflect absence of particular 
areas from distributions of species related by 
a particular cladogram (the cladogram is seen 
as an individual subtree). 

Binary characters describing distributions 
of species may be seen in two ways: (1) as 
presence/absence data independent of any 
tree; (2) as representations of terminal nodes 
of a tree, relating organisms of each species. 


TABLE 15 
Number of Independent Three-Item Statements? 
Number of 
Statements Trees 

1508 1 

1458 1 

1442 1 

1340 1 


2 Of the four least-resolved trees (table 14, nine nodes). 
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Fig. 25. Results of parsimony analysis of complete component matrices (combined matrices of tables 
10 and 16; 28 nodes and 29 species distributions) for combined cladograms (figs. 18, 22). Left (18 nodes, 
no assumption 2), strict consensus of two trees (length 124, ci 45, ri 80). Right (23 nodes, assumption 


2), one tree (length 122, ci 46, ri 79). 


Seen as a terminal node, l-entries of each 
binary character are related together relative 
to O-entries. For the fish distributions, such 
a binary character without missing data in- 
cludes 0-entries that result from comparison 
across paralogous nodes of the combined 
cladograms. Binary characters without miss- 
ing data (table 16) may be deemed appro- 
priate representations of terminal nodes of 
combined cladograms that include paralo- 
gous nodes (figs. 18, 22). Such binary char- 
acters are not appropriate for terminal nodes 
of individual subtrees (figs. 20, 23), because 
no subtree relates species occurring in all ar- 
eas. As representations of terminal nodes, all 
such binary characters (table 16) include par- 
alogous 0-entries. 

Data for all nodes (table 10) and all species 
distributions (table 16) can be combined in 
a “complete” component matrix with max- 
imum paralogy. Parsimony analysis of this 
matrix without assumption 2 yields two trees, 
with a strict consensus of 18 informative 
nodes (fig. 25, left). Parsimony analysis of the 


matrix with assumption 2 yields one tree of 
23 nodes (fig. 25, right). 

Comparison of figure 25 left and nght shows 
the effects of the application of assumption 
2 on results of parsimony analysis of a com- 
plete component matrix for the combined 
cladograms (nodes and species distributions): 


1. overall increase in resolution (from 18 to 
23 informative nodes); 

2. increased resolution within one grouping 
(20-27), with addition of two novel nodes 
(areas 20-23, 25-27; areas 20—22, 27); 

3. many different groupings of the same areas. 


These effects contrast with those observed in 
comparison of results achieved only for nodes 
(fig. 19), where there are overall reduction in 
resolution, one novel node (23-36), and few- 
er different groupings of the same areas. 


INDIVIDUAL FISH SUBTREES (FIG. 26) 


Paralogy may be eliminated from the bi- 
nary characters describing species distribu- 
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TABLE 16 
Binary Matrix for Geographic Distribution of Fish Species 


Species: 00000000011111111112222222222 
123456789012345678901234567 89 

OUTGROUP 00000000000000000000000000000 
01 KIAMICHI 00100001001001000000000000000 
02 OUACHITA 00100000001001001000000000001 
03 MIDDLE ARKANSAS 00100001001001000000000000110 
04 UPPER ARKANSAS 00100001001000000000000000010 
05 WHITE 00100010010001001000010001000 
06 ELEVEN POINT 00100100010001001000010001000 
07 CURRENT 00100100010001001000010001000 
08 BLACK 00100100010001001000010001000 
09 ST FRANCIS 00100100010001001000000000000 
10 OSAGE 00100100010001000000100010000 
11 GASCONADE 00100100010001000000100010000 
12 MERAMEC 00100100010001000000100000000 
13 SALT (MISSOURD 00000000010000000000000000000 
14 DES MOINES 00100000010000000000000000000 
15 UPPER MISSISSIPPI 00100000010000000000000000000 
16 ILLINOIS 00000000010000000000000000000 
17 KASKASIA 00000000010000000000000000000 
18 BIG MUDDY 00000000010000000000000000000 
19 WABASH 00000000010001001000000000000 
20 MIAMI 00000000010000000010000000000 
21 SCIOTO 00000000010000000010000000000 
22 UPPER OHIO 00000000010000000010000000000 
23 LOWER KANAWHA 00000000000000000010000000000 
24 UPPER KANAWHA 00000000000000000001000000000 
25 BIG SANDY 00000000000000000010000000000 
26 LICKING 00000000000000000010000000000 
27 KENTUCKY 00000000010010000010000000000 
28 SALT (KENTUCKY) 00000000000010000000000000000 
29 GREEN 01000000100010000000000000000 
30 CUMBERLAND 01000000100010000000000000000 
31 DUCK 01000000100010000100000100000 
32 LOWER TENNESSEE 01001000000010000100000100000 
33 UPPER TENNESSEE 01001000000010010100000000000 
34 MOBILE BASIN 10010000000100100000001000000 


* For geographic distributions of 29 species (after Mayden, 1988: 335-337, table 1, 


see note below), no assumption 2. For assumption 2, certain 1-entries become 0-entries: 
area 3, species 28; area 4, species 8; area 32, species 2, 13, 24; area 33, species 5. 
Note: In his matrix Mayden treated species 13 and 14 as one species; they are here 
treated as separate species, save for the analysis of his corrected matrix (fig. 29, right). 
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tions by replacing certain O-entries by ?-en- 
tries for missing data (table 17) for both series 
of individual subtrees considered above (figs. 
20, 23). Revised characters for species dis- 
tributions combine with characters for nodes 
to make two complete matrices: with and 
without assumption 2. Parsimony analysis of 
each complete subtree matrix yields one min- 
imal tree (fig. 26, left and right). As rendered 
by subtree analysis, these minimal trees are 
offered as exact results for the paralogy-free 


fraction of the complete data, including spe- 
cies distributions as if the distributions were 
terminal nodes of individual subtrees. 

Comparison of figure 26 left and right shows" 
effects of application of assumption 2 on re- 
sults of parsimony analysis of a complete ma- 
trix for subtrees (nodes and species distri- 
butions seen as terminal nodes of individual 
subtrees): 


1. overall increase in resolution (from 11 to 
13 informative nodes); 
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Fig. 26. Minimal trees from parsimony analysis of complete component matrices for subtrees (clado- 
gram nodes and species distributions). Left (11 nodes), no assumption 2. Right (13 nodes), assumption 
2. Complete matrix for left tree (CML) combines matrices for 15 nodes (table 11) and 19 species 
distributions (table 17, left, species 2-28). Complete matrix for right tree (CMR) combines matrices for 
15 nodes (table 13) and 16 species distributions (table 17, right, species 2-26). Parsimony analysis of 
CML yields 5303+ (overflow) trees (length 39, ci 87, ri 95). Their strict consensus has three informative 
nodes (areas 1-33; areas 1—4; areas 20-27). Of 5303 trees, 10 are least resolved (17 nodes, range 17- 
27), of which two are least informative (2234 independent three-item statements, range 2234-2372). 
The two trees differ in placement of area 5 (either with areas 1—4 or with areas 6—9). The strict consensus 
of the two trees has length 40. With collapse of nodes and more basal placement of some areas, the 
consensus yields the left Minimal Tree (length 40). Parsimony analysis of CMR yields 5298+ (overflow) 
trees (length 38, ci 81, ri 92). Their strict consensus has one informative node (areas 2, 4). Of 5298 trees, 
7 are least resolved (16 nodes, range 16-31), of which 1 tree is least informative (1920 independent 
3-item statements, range 1920-2352). With collapse of nodes and more basal placement of some areas 


this tree yields the right Minimal Tree (length 38). 


2. increased resolution within one grouping 
(areas 1-27) with additional of 1 novel node 
(areas 2, 4); 

3. different groupings of the same areas (areas 
32, 34). 


These effects are similar to those observed in 
comparison of results achieved only for nodes 
(fig. 21, right; fig. 24, right). 


DISCUSSION OF RESULTS OF 
PARSIMONY ANALYSES 
AREAS WITH IDENTICAL CHARACTER 
STRINGS 


There is variety in results of the above 
analyses. There is commonality as well, some 


of which stems from identical data for dif- 
ferent areas. In all component matrices for 
nodes, certain areas have identical character 
strings, and parsimony analysis consequently 
places them together: areas 6—8; areas 10-11; 
areas 13, 16-18; areas 14-15; areas 20-22; 
areas 23-26; areas 29-30. The same applies 
to all component matrices for nodes and spe- 
cies distributions, with exception of areas 23- 
26, which are reduced to areas 23, 25-26. 
Areas of any one of these seven groups sel- 
dom appear themselves as the only areas re- 
lated by a node in trees resulting from par- 
simony analysis, but among 95 nodes of those 
trees, 10 (11%) relate areas only of one of the 
seven groups. For matrices of nodes, none of 
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TABLE 17 
Component Matrices for Informative Distributions of Fish Species? 

Species: 0000001111111222222 0000011111112222 
2356890134789124568 2368901347891256 

OUTGROUP 0000000000000000000 0000000000000000 
01 KIAMICHI 0100100101777777777 0101001017777 727 
02 OUACHITA 017?7700101177770770 0177001011777700 
03 MIDDLE ARKANSAS 0100100101777770001 01010010177772700 
04 UPPER ARKANSAS 0100100177777770771 0177001777777 700 
05 WHITE 010000100110001001?7? 0100010011000101 
06 ELEVEN POINT 010100100110001001? 0110010011000101 
07 CURRENT 010100100110001001? 0110010011000101 
08 BLACK 010100100110001001?7? 0110010011000101 
09 ST FRANCIS 0101001001177777777 0110010011772772? 
10 OSAGE 010100100170010010? 0110010017001010 
11 GASCONADE 010100100170010010? 0110010017001010 
12 MERAMEC 01010010017001077?7?7 01100100170010?7? 
13 SALT (MISSOURD 77777901077777979279777? 2777010779777 277? 
14 DES MOINES 0177701077777 777777? 017701077777 7727 
15 UPPER MISSISSIPPI 017770107777972797777? 0177010777272 277? 
16 ILLINOIS 7777701077777 772797977 729777701077777777? 
17 KASKASIA 277727701079 777979777777 2977701077777 72727?7 
18 BIG MUDDY 2777770109797 77977727977 297777010777772792727? 
19 WABASH 7777701001179 77727977 7777010011777227? 
20 MIAMI 72777010777010072777 2777701077790100?7? 
21 SCIOTO 777770107770100777?7 77770107770100?? 
22 UPPER OHIO 777977010777010079797?7? 77770107770100?? 
23 LOWER KANAWHA ?7777779797770100272777 727797777777 770100?? 
24 UPPER KANAWHA ?7777797779770000277?77 277777777 7270000?? 
25 BIG SANDY 7727979777777 701007777 779797797977 701002?7 
26 LICKING 277979777279 772970100727977 77977777777 70100?2? 
27 KENTUCKY 27277770101070100777?7? 2?7770101070100?27 
28 SALT (KENTUCKY) ?7777777107777272777 2777777710722 27272777 
29 GREEN 10?7710010777777777 1077100107777272? 
30 CUMBERLAND 107?77100107777772777 1077100107777777 
31 DUCK 1072?710010710001000 1077100107100000 
32 LOWER TENNESSEE 1010077710710001000 ?70077797771000?7? 
33 UPPER TENNESSEE 1010077710010007?777 10777771001000?2? 
34 MOBILE BASIN 0000077700077?770000 000077?700077?7700 


¢ For informative geographic distributions of 29 species considered as terminal nodes of sub- 
trees: left (species 2-28), no assumption 2 (fig. 20); right (species 2-26), assumption 2 (fig. 23). 


the seven groups has any unique character; 
for matrices including species distributions, 
only one (10—11) has. Otherwise, areas of one 
or another of the seven groups appear related 
by a node because of optimization of ho- 
moplastic characters. As such, these nodes 
are possibly artifactual. If the resulting trees 
are divided into two classes according to the 
nature of their matrix (combined cladograms 
versus individual subtrees), then 9 of the 10 
nodes occur in the combined class (16% of 
57 nodes), and only one appears in the in- 
dividual class (3% of 38 nodes). These find- 
ings suggest that paralogy, which is maximal 


in matrices for combined cladograms and ab- 
sent from matrices for individual subtrees, 
results in artifactual nodes. 


EFFECTS OF PARALOGY 


Effects of paralogy may be assessed by 
comparison of the consistency index of trees 
derived by parsimony analysis of the two 
classes of matrices (table 18). The consistency 
index of trees derived by parsimony analysis 
of matrices for individual subtrees is higher 
by a factor of two. Number of nodes resolved 
is lower by a third. These findings suggest that 
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TABLE 18 
Consistency Index (ci) and Number of Nodes of 
Trees? 
Nodes and 
species 
etal aan Aver- 
Matrix Nodes only distributions ase Total 
class noA2 A2 noA2 A2 ci nodes 
Combined 
ci 38 37 45 46 42 
Nodes 10 6 18 23 57 
Individual 
ci 93 ~=100 87 81 90 
Nodes 6 8 11 13 38 


@ Derived from two classes of matrices (combined and 
individual). Results for combined matrices are shown in 
figures 19, left and right, and 25, left and right. Results 
for individual matrices are shown in figures 21, right; 
24, right; and 26, left and right). Abbreviations: no A2, 
no assumption 2; A2, assumption 2. 


paralogy contributes both artifactual incon- 
sistency to matrices for combined clado- 
grams and artifactual nodes to results of par- 
simony analysis of these matrices. 


EFFECTS OF ASSUMPTION 2 


Effects of assumption 2 are not apparent 
by such comparison (table 18). In matrices 
for combined cladograms, assumption 2 
causes some l-entries to become 0-entries 
(data change, and paralogy is apt to increase 
as a result). For results of parsimony analysis 
of such matrices, assumption 2 hardly affects 
consistency and causes both decrease (nodes 
only) and increase (nodes and species distri- 
butions) in nodes resolved. In matrices for 
individual subtrees, assumption 2 causes 
some l-and OQ-entries to become ?-entries 
(conflicting data disappear). For results of 
parsimony analysis of such matrices, as- 
sumption 2 causes both increase (nodes only) 
and decrease (nodes and species distribu- 
tions) in consistency, and in both cases causes 
increase in number of nodes resolved. The 
decrease in consistency (from 87 to 81, nodes 
and species distributions) may seem anom- 
alous. It may be remembered, however, that 
assumption 2 applies not to conflicts among 
species distributions (seen as terminal nodes) 
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Fig. 27. Cladogram (node 8) in which wide- 


spread species (Zonatus and cardinalis) are seen as 
terminal nodes (cf. fig. 22). 


but only to conflicts arising within and be- 
tween the 27 nodes of the combined clado- 
grams, and then only to conflicts involving 
endemic and widespread species (figs. 18, 22). 
The complete matrix for assumption 2 (ta- 
ble 13 for nodes, table 17: 2—26 for species 
distributions) includes six characters that re- 
duce consistency of the resulting tree (table 
19). If all six characters are rendered inactive, 
then the consistency (ci) of the resulting tree 
is 100. Of the six conflicting species distri- 
butions, assumption 2 might be applied only 
to one of them (species 18), with reduction 
of its distribution to areas 31 and 33 (the 
relationship of area 32 is determined by the 
possible endemic, L. coccogenis; see above). 
The resulting consistency (83) would still be 
a reduction (from 87 for the result without 
assumption 2). Distributions of the remain- 
ing five species might, of course, be reduced 
to yield 100% consistency, but not by appli- 
cation of assumption 2 (there are no endemic 
species relevant to such reduction). 
Treating species distributions as terminal 
nodes tends to render absurd notions of en- 
demic and widespread species. Consider, for 
example, the cladogram of node 8 (fig. 27), 
with widespread species zonatus and cardi- 
nalis treated as terminal nodes. The resulting 
terminals are merely areas with organisms 
that lack any distinguishing biological fea- 
tures. All terminals become single areas as if 
each were characterized by an endemic taxon. 
If the node zonatus were reduced in a manner 
analogous with application of assumption 2, 
so as to relate organisms only of areas 6-8, 
or of areas 10-12, then the reduced zonatus 
would be consistent with results of parsimony 
analysis of the entire matrix (fig. 26, right). 
The object of consistency aside, there is no 
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rationale to govern such reduction. Assump- 
tion 2 is inapplicable, therefore, to five of the 
Six species distributions. 

The import of assumption 2 lies in another 
dimension: identification of widespread spe- 
cies of which the distribution conflicts with 
area relationship as determined by endemics. 
In the case of the fishes, conflicts identified 
concern areas 1-4 and areas 31-34. These 
two groups of areas involve major changes 
in drainage pattern, particularly area 4 (Up- 
per Arkansas) and area 32 (Lower Tennes- 
see). At an earlier time, the ““Upper Arkan- 
sas’> was part of the Plains Stream, which 
entered the Gulf of Mexico independently of 
the Mississippi system; the ‘““Lower Tennes- 
see”’ was part of the Old Tennessee River, 
with a possible connection to the Mobile Ba- 
sin (Mayden, 1988: fig. 4). Application of as- 
sumption 2 associates the Upper Arkansas 
and Ouachita (areas 2 and 4), and the Lower 
Tennessee and Mobile Basin (areas 32 and 
34); identifies possible endemics of these old- 
er drainages (E. cragini to the Upper Arkan- 
sas, L. coccogenis to the Lower Tennessee); 
and suggests that other species, occurring there 
today, are not native to the older drainages 
(L. cardinalis in the Upper Arkansas; N. leu- 
ciodus, F. catenatus east, and E. boschungi 
in the Lower Tennessee). These implications 
are testable, perhaps, only indirectly and with 
difficulty. Area relationships (areas 2 and 4; 
areas 32 and 34), however, are more directly 
testable through analysis of other taxa. 

The significance of the six species, of which 
the distribution reduces consistency, lies in a 
yet different dimension. It is possible, of 
course, that each such species at one time was 
endemic to one area, and subsequently be- 
came widespread (all things are possible). The 
distribution of each species, when fitted to 
the tree resulting from parsimony analysis of 
the complete matrix (fig. 26, right), suggests 
another possibility—that each species is a 
complex of forms with diverse relationships. 
This implication 1s directly testable by fur- 
ther study of geographic variation of the 
widespread species with conflicting distri- 
butions. 

One may note (table 19) that identification 
of these six species with conflicting distri- 
butions and assessment of their possible sig- 
nificance do not require the complete matrix 
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TABLE 19 
Inconsistent Species Distribution of Freshwater 
Fishes? 

Matrix ci of result 
character pas eee 
(Ist =0) Species Distribution fig. 26 fig. 24 

17 6 6-12 83 94 
20 10 5-22, 27 85 88 
21 11 1-4 83 94 
22 13 27-31, 33 83 94 
25 18 31-33 83 94 
30 26 5-8 83 94 


2 Inconsistent with results of parsimony analysis of 
matrices with (fig. 26, from table 13 + table 17: 2-26) 
and without (fig. 24, from table 13) binary characters for 
distribution of 29 species, assumption 2. For figure 26, 
the ci is that of resulting tree when given matrix character 
is rendered inactive (if active, ci = 81). For figure 24, 
the ci is that of the resulting tree when character is ren- 
dered active (if inactive ci = 100). For figure 26, the six 
characters are the only characters with length greater 
than one (characters 17, 21, 22, 25, and 30, length 2; 20, 
length 3). For figure 24, other characters (all inactive) 
have length greater than one (characters 15, 17-19, 23, 
25-27, 29, and 30, length 2; 21 and 22, length 3; and 
20, length 4); the characters with length greater than one 
and not included in table (15, 18, 19, 23, 26, 27, and 
29) are each compatible with the tree such that if the 
character is rendered active, consistency is 100% (the 
resulting tree is different from that of fig. 24). 


(nodes and species distributions; table 13 + 
table 17: 2-26) nor results of parsimony anal- 
ysis of this matrix (fig. 26). Identification of 
species with conflicting distributions may be 
achieved with the matrix for nodes only (ta- 
ble 13) and their significance assessed by 
means of the tree (fig. 24) resulting from par- 
simony analysis of that matrix. 

Earlier, we were skeptical of the value of 
including in a matrix characters for distri- 
butions of species, which we saw as mere 
presence/absence (“‘phenetic’’) data indepen- 
dent of any tree (Nelson and Ladiges, 199 1a, 
1991b). Viewed as terminal nodes of trees, 
however, species distributions are cast in a 
different light. Nevertheless, we remain skep- 
tical, and unconvinced, of their value. In the 
present case, they add little or nothing to re- 
sults obtainable by parsimony analysis of 
nodes only (cf. figs. 21, 26, left; 24, 26, right). 

If species distributions (seen as terminal 
nodes of individual subtrees) are by them- 
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Fig. 28. Minimal tree (nine nodes) from parsi- 
mony analysis of component matrix for 19 species 
distributions (table 17, left, species 2-28, no as- 
sumption 2). Parsimony analysis yields 5302+ 
(overflow) trees (length 21, ci 90, ri 96). Their strict 
consensus has three informative nodes (areas 1- 
19, 24; areas 3-4; areas 20-23, 25-33). Of 5302 
trees, 2 are least resolved (13 nodes, range 13-28), 
of which | is least informative (1908 independent 
three-item statements, range 1908-1968). Col- 
lapse of nodes and more basal placement of some 
areas yield this minimal tree (length 21). With 
assumption 2 (table 17, right, species 2-26), the 
minimal tree (eight nodes, length 18, ci 88, ri 96) 
is the same but lacks a node relating areas 3-4. 


Fig. 29. Strict consensus trees. Left (16 nodes), tree obtained by Mayden (1988: 344), confirmed with 
slight correction of matrix by Nelson and Ladiges (1991a: 44, 57). Right (15 nodes), tree obtained by 
parsimony analysis of Mayden’s matrix, with correction of characters describing cladograms (table 20, 
right, TS4—27). For the full matrix there are 5304+ (overflow) trees (length 143, ci 81, ri 89). For the 
consolidated matrix (areas with duplicate character strings deleted), there are 165 trees (length 143, ci 
81, ri 85). Two-state characters for nodes weighted x4 (multistate characters for nodes weighted <2 in 
Mayden’s original analysis). Lower differential weights yield less resolution. 
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Fig. 30. Results of parsimony analysis of each of Mayden’s multistate characters (table 20, left, MS4— 
18) describing seven cladograms of fishes. Cladograms (nodes) 4, 5, 17, 18 are erroneous (*). Cladogram 
(node) 17C partially corrected by Nelson and Ladiges (1991a; absent is node 20, uniting nodes 22 and 
23). Errors: node 4, area 34 erroneously included; node 5, erroneous node (areas 1-19); node 17, erroneous 
node (areas 5-34), area 34 erroneously included, no node 20 (uniting nodes 22 and 23); node 18, erroneous 
node (areas 2-8), no node 25 (areas 3-11), no node 27 (areas 3-8). 


selves combined in a component matrix (ta- 
ble 17: 2—28 or 2—26) then parsimony anal- 
ysis of this matrix yields a minimal tree (fig. 
28, nine nodes) with branching at variance 
with that of trees obtained from parsimony 
analysis of matrices only for subtree nodes 
(figs. 21, right; 24, right). Conflict arises not 
from paralogy (matrices for species distri- 
butions and nodes are paralogy free). It seems 
merely futile, and false, to expect species dis- 
tributions themselves to reflect the historical 
pattern of branching. 


MAYDEN’S ORIGINAL MATRIX (FIG. 29) 


With parsimony analysis of a complete 
matrix (including a binary character repre- 
senting the distribution of each of the 29 spe- 
cies [see below] and multistate characters for 
the nodes of the seven cladograms treated as 


individual subtrees), Mayden obtained 33 
trees, of which the strict consensus (fig. 29, 
left) has 16 informative nodes (confirmed with 
slight correction of the matrix by Nelson and 
Ladiges, 1991a: 57). 

As published by Mayden (1988: 336-377, 
table 1), the multistate characters (table 20, 
MS4-18; fig. 30) for four of the seven clado- 
grams of these analyses are inaccurate. Nev- 
ertheless, parsimony analysis of the corrected 
matrix, with cladogram nodes represented by 
component data (table 20, TS4—27; fig. 31), 
yields more numerous trees, with a strict con- 
sensus (fig. 29, right, 15 informative nodes) 
only slightly different from Mayden’s result. 

Aside from effects of assumption 2, results 
obtained by Mayden (fig. 29, original and cor- 
rected) are not very different from those 
achieved here for complete matrices (fig. 26). 
The reasons are easy to understand. His ma- 
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Fig. 31. Results of parsimony analysis of each series of two-state characters (table 20, right, TS4—27), 
correctly describing seven cladograms of fishes. For each series the consistency index (ci) is 100, except 
for that of cladogram (node) 18, in which area 3 appears twice. If the two occurrences of area 3 are 
distinguished (e.g., as 3a and 3b), then the result is as shown, with ci 100. If the two occurrences are 
not distinguished, then the result is as shown for area 3b, with ci 85. 


trix includes characters for nodes as if nodes 
were of subtrees (characters are paralogy free). 
His matrix includes characters for species 
distributions as if distributions were terminal 
nodes of combined cladograms (characters 
are maximally paralogous). Computational 
problems are apparently overcome in this case 
(see below) by differential weighting of char- 
acters for nodes— x 2 for multistates, equiv- 
alent to x 4 for two-state equivalents (fig. 29). 


ASSUMPTION ZERO 


To include in a matrix characters for both 
cladogram nodes and species distributions is 
a practice begun by Wiley (1987: 297-299) 
in “‘an example of how a parsimony analysis 
might work, using the same hypothetical taxa 
as Humphries and Parenti (1986),”’ [not] “‘a 
formal method but only a sketch of the gen- 


eral outlines of a formal method.” Zandee 
and Roos (1987) formalized the practice un- 
der the name of “assumption zero.” Wiley 
suggested that species distributions be rep- 
resented as characters without missing data, 
and appropriate nodes of cladograms be rep- 
resented as characters with missing data—a 
suggestion imperfectly implemented in his 
treatment of hypothetical examples (Wiley, 
1987: table 3, a binary matrix wherein most, 
but not all, relevant cladogram nodes are rep- 
resented by characters with missing data— 
column BX is the exception, corrected in his 
figure 10). ““The binary coding for groups with 
incomplete distributional patterns is... 
complicated. There are three possibili- 
ties.... I coded all hypothetical ancestors 
under this condition as missing data”’ (Wiley, 
1987: 301-302). 

Wiley (1988a) later renamed the practice 
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TABLE 20 
Multistate (MS) and Two-State (TS) Characters for Cladogram Nodes? 
MS TS 

Nodes of 0000111110001010111111222« «122222 

Cladograms: 457817788 49 50 73 8459 16 7023 814567 

OUTGROUP 000100100 00 00 00 0000 00 0000 000000 
01 KIAMICHI 23232727277 111111101177 97797 2272927? 
02 OUACHITA 232727747 1111411279777 «1177977? «111010 
03 MIDDLE ARKANSAS 232377743 111111101177 797797 111111 
04 UPPER ARKANSAS 272377747 11727 11101177 7777 111010 
05 WHITE 232321032 11411411:10111121101111101 
06 ELEVEN POINT 232221032 1111111010 111101111101 
07 CURRENT 232221032 11114111010 111101111101 
08 BLACK 232221032 1111111010111101111101 
09 ST FRANCIS 232227777 111111:10104117°777 2722727? 
10 OSAGE 232271031 11411411:101077 1101 111100 
11 GASCONADE 232271031 111111101077 1101 111100 
12 MERAMEC 232271077 111111101077 1101 272777? 
13 SALT (MISSOURN 277777777 117727977777 7977972727 7727227? 
14 DES MOINES 22792 T37 PL Pe EL Pe aa a aaa 
15 UPPER MISSISSIPPI 2727777727 112727 11297277 7292 792777 297222277 
16 ILLINOIS 272?272797777 112792772 7972797 797 797977 79227227227? 
17 KASKASIA QP PEP LE VL SOD Pee? PP SP eT TSE? 
18 BIG MUDDY ed as ad dag es ay Jaks ee Dts Ugh! se i a A ol a ya i dE oak dey es A i 
19 WABASH 237?727777 111177 797777 117972977 27272977? 
20 MIAMI 2277-21272 LLP? 27 92277-7727 L110 777777 
21 SCIOTO 277771277 1177797797977 97 11107772727? 
22 UPPER OHIO ZIV PPA 29? Pee? eta? Lb 227279 
23 LOWER KANAWHA ?777271277 77 77 777777 7297 1110772772? 
24 UPPER KANAWHA ?7?7771377 7277 7277977777? 77 1110722272777 
25 BIG SANDY 777771277 72977972 7977972727 797111027777? 
26 LICKING 7977979712772 7979 7972 72779797 797? 1110777777 
27 KENTUCKY 22 PEP QE PPLE Pee EPP? D0? Be? 2 
28 SALT (KENTUCKY) 727777777 7277 11797 7972727 7977979727 7297279 
29 GREEN R227 2 2772-10 11 P1222 Pept eT Pea 
30 CUMBERLAND Ui? a ie ie ee Ze | cd Wc cat Ue Ja? i! i A ol A? le A de i le QJ 
31 DUCK 122771020 10114117777 77 1000 110000 
32 LOWER TENNESSEE ?220710207?7 1111110077 1000 110000 
33 UPPER TENNESSEE 722021077? 77 11111100411 1000 72277277? 
34 MOBILE BASIN 01101701077 1010110010777?7 100000 


@ Nodes 4-18 (left), MS characters for cladograms of seven species groups (after Mayden, 
1988: 336-337, table 1). Nodes 4—27 (right), TS characters equivalent to corrected MS char- 
acters. MS4 is correctly represented by TS4 + 9; MSS, by TSS + 10; MS7, by TS7 + 13; MS8, 
by TS8 + 14+ 15 + 19; MS11, by TS11 + 16; MS17, by TS17 + 20 + 22 + 23; MS18, by 
TS18 + 21 + 24 + 25 + 26 + 27. The correction by Nelson and Ladiges (1991: 57) replaces 
0-entries by 1-entries for areas 31-33 (MS17, character 2); and one 0-entry by a ?-entry for 


area 34 (MS17, character 2). 


as “Brooks Parsimony Analysis” (BPA) and 
maintained the different style of characters 
for species distributions and cladogram nodes 
(Wiley, 1988a: tables 3, 4; 1988b: table 3). 
The practice was continued by Wiley et al. 
(1991: table 7.4). Characters with missing data 
were extended to species distributions by Page 
(1990: 124, table 2) in order to make “explicit 


the relationship between the cladogram and 
the matrix,” and without comment by Brooks 
(1990: tables 12, 14-15) and Funk and Brooks 
(1990: tables 9, 10). Nelson and Ladiges 
(1991b: 473) noted that “‘assumption zero 
treats a widespread taxon as if it were a node 
relating the areas in which the taxon occurs.” 
Humphries (1992: fig. 9.15) depicted wide- 
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Fig. 32. Subtrees and area cladogram for beetles 
of the genus Platynus (modified from Page and 
Lydeard, 1994: figs. 9, 10). Symbols: CA, Central 
America; CH, central and northern Hispaniola; 
EC, eastern Cuba; J, Jamaica; LA, Lesser Antilles; 
NoCA, northern Central America; SH, southern 
Hispaniola; WC, western Cuba. In Subtree 4 are 
numbers showing redundancy of areas distal to 
paralogous node 4 (see fig. 33). 


spread species as terminal cladogram nodes 
and represented widespread species in a ma- 
trix by characters with missing data. 


MISSING DATA 


A matrix with missing data (e.g., table 13) 
invites existing parsimony programs to find 
trees that have shortest length but are over- 
resolved for data of the matrix (see Methods). 
Of trees found for the matrix of table 13, for 
example, there are 64 trees with 28 infor- 
mative nodes—for a matrix of 15 two-state 
characters and 100% consistency! In this case, 
programs did not find the minimal tree (eight 
nodes) before filling computer memory with 
overresolved trees. 


NO. 3167 


Consider again Mayden’s original matrix, 
which for species distributions includes char- 
acters without missing data. During parsi- 
mony analysis of this matrix, accumulation 
of overresolved trees in computer memory is 
inhibited. Optimization of characters for spe- 
cies distributions permits an informative re- 
sult without search for a least resolved tree. 
For example, the first of 165 trees found for 
the consolidated matrix has 20 informative 
nodes. Of 28 species, 11 are endemic, leaving 
17 widespread species. Characters for these 
17 species distributions optimize (13 of the 
17 as homoplasies) such that homoplasies de- 
fine 14 of the 20 nodes of the tree. With dif- 
ferential weighting in favor of cladogram 
nodes, characters for species distributions in- 
hibit accumulation of overresolved trees, with 
consequent lack of resolution in their strict 
consensus. This problem was alluded to by 
Wiley (1988b: 530): 


R. L. Mayden’s ... analysis of faunas of areas of 
endemism in the Central Highlands of North America 
includes a number of “relic” areas that contain only 
a small number of the total groups in the analysis. To 
counter false placement, he was forced to take these 
areas out of the initial analysis and treat them as one 
might treat a poorly preserved fossil, that is, adding 
them in a post hoc manner according to the evidence 
actually associated with them. 


Wiley’s “‘relic’’ areas refer, among others, to 
areas 13, 16-18, 23-26, and 28, each of which 
is part of the distribution only of 1 wide- 
spread species of the 29 fishes of Mayden’s 
study and which therefore occurs only in one 
of the seven species groups. Areas 13 and 16— 
18, for example, are part of the distribution 
of Nocomis biguttatus and no other species 
(fig. 18, species 10); areas 23-26 are part of 
the distribution of Etheostoma variatumi (fig. 
18, species 19); area 28 is part of the distri- 
bution of Fundulus catenatus (fig. 18, species 
13). As such these areas appear only as miss- 
ing data in the component characters for the 
nodes of six of the seven species groups. The 
tactic described by Wiley would have as its 
effect the elimination of some missing data 
from the matrix, but that tactic was not used 
by Mayden in his published analysis. Dele- 
terious effects of missing data were inhibited 
by binary characters for species distributions 
(see above) and, before parsimony analysis, 
by eliminating from the matrix areas with 
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Fig. 33. Relationships of 33 species of beetles 
of the genus Platynus (modified from Page and 
Lydeard, 1994: fig. 8; after Liebherr, 1988: fig. 6). 
Symbols as in figure 32. 


duplicate character strings: “The number of 
equally parsimonious trees was reduced 
somewhat in this analysis by consolidating 
drainages with the same character strings” 
(Mayden, 1988: 335). Nelson and Ladiges 
(199 1a: fig. 3) found that the full matrix yields 
about 1500 trees, and the matrix for consol- 
idated drainages yields 33 trees—a reduction 
of about 98%. 


GENERAL DISCUSSION 


SUBTREES AND SUBTREES 


In “Designing a Biogeographic Study,” Page 
and Lydeard (1994) considered an empirical 
example, one of Liebherr’s (1988) taxon-area 
cladograms for beetles of the genus Platynus. 
They noted: 


if we decompose the cladogram into subtrees that 
minimize redundancy (this can be likened to identi- 
fying the sets of biogeographically orthologous taxa; 
see Nelson and Ladiges, 1991b: 481; and Page, 1993), 
we see that all the subtrees are mutually consistent 
... [fig. 32, Subtrees 1-4]; that is we could combine 
them all to create one or more area cladograms that 


all subtrees could agree with. Figure ... [32, Area 
Cladogram] shows the area cladogram for Platynus 
that has the fewest items of error (Nelson and Plat- 
nick, 1981; Page, 1990). 


Their comment approaches the core of our 
present effort, but they describe a different 
purpose: “‘to create one or more area clado- 
grams that all subtrees could agree with.” The 
purpose of subtree analysis as implemented 
here is to specify the data relevant to cladistic 
biogeography—data that might conflict 
among themselves or not conflict as the case 
might be. Subtree analysis of their empirical 
example leads to a different result. 

The cladogram for Platynus (fig. 33) in- 
cludes 33 species, all but one of which (spe- 
cies 17) are endemic to one of eight areas in 
the region of Central America and the An- 
tilles (species 17 is widespread in two of the 
areas). The cladogram reduces to Subtrees 1- 
4, which include eight informative nodes (fig. 
34). 

Subtrees 1—4 are similar to the subtrees of 
Page and Lydeard, but there are differences. 
Their four subtrees do not include Subtree 2. 
Their subtrees together include only five in- 
formative nodes (fig. 32), four of which are 
among the eight nodes of Subtrees 1-4 (the 
informative node of their subtree 1 is node 
23 of Subtree 1; that of their subtree 2 is node 
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TABLE 21 
Component and Three-item Matrices for Subtrees 

of Beetles 

Nodes Nodes of Subtrees 

01111122 0000 111 111 11 11 11 2 22 
91568923 9999 111 555 66 88 99 2 33 
OG 00000000 0000 000 000 00 00 00 0 00 
LA 00000000 0000 770 2770 00 00 70 0 70 
J 297717027 229277: 277:°79797 :*«11:°797: 07:27:72 
CH 10171771 11797 0297 111792 127977 201 
SH 10171721 292711 707 11172 7197272141 
BC 11077927277 2171 111 707 72: 77:797:2:°22 
we 110772792 121? 111 0277 77:27:97 2:2? 
NoCA77711110 7277 777 72727: 12 111110? 
CA 227717117 77297 :777:2997:71:°797:*«112-:1«297 


2 Left, component matrix for eight informative nodes 
of four subtrees (fig. 34). Right, three-item matrix for 
eight informative nodes of four subtrees (fig. 34). Three- 
item matrix (right) is not derivable from component 
matrix (left). Symbols as in figure 32. 


19 of Subtree 3; the two informative nodes 
of their subtree 3 are nodes 11 and 15 of 
Subtree 4). Their subtree 4 has one infor- 
mative node, which is node 4 of the clado- 
gram for Platynus (fig. 33); that node is par- 
alogous and without equivalent in Subtrees 
1-4 (fig. 34). | 

Parsimony analysis of a component matrix 
(table 21, left) for the eight nodes of Subtrees 
1-4 (fig. 34) yields 96 trees, length 8, ci 100, 
ri 100. A strict consensus of the 96 trees has 
one node grouping all areas save the Lesser 
Antilles (LA). Search for a minimal tree re- 
veals one tree (fig. 35), length 8, ci 100. Three- 
item analysis of the eight nodes yields 19 
statements (table 21, right). Parsimony anal- 
ysis of a uniformly weighted matrix yields 96 
trees, length 19, ci 100, ri 100, with the same 
strict consensus and the same minimal tree. 
Parsimony analysis of a fractionally weighted 
(x4) matrix yields the same results, length 
72, ci 100, ri 100. This minimal tree of four 
nodes (fig. 35) is offered as an exact result of 
parsimony analysis of the matrix for the par- 
alogy-free fraction of data of all nodes of the 
cladogram for Platynus. 

Of the 96 trees of shortest length for the 
component data, the number of nodes varies 
from three to six. There are 45 trees of six 
nodes, among which is the tree that Page and 
Lydeard found to have the fewest items of 
error (fig. 32, Area Cladogram). Items of error 
are determined by all nodes of the relevant 
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tree, in this case all 28 nodes of the tree of 
figure 33. Of these 28 nodes, only 8 appear 
as informative nodes of Subtrees 1—4 (nodes 
9,11, 15, 16, 18, 19, 22, 23), leaving 20 nodes 
as paralogous. Items of error in this case are 
determined predominantly by paralogous 
nodes. The tree with fewest items of error 1s 
overresolved for the data (see also below). 


MINIMALITY 


The minimal tree for Platynus (fig. 35, 
Minimal Tree) has the interesting property 
of minimality relative to other trees of short- 
est length and fewer nodes (fig. 35, Non-Min 
Trees 1 and 2). Minimality, in the sense of 
“‘minimal tree’? used here, does not mean 
merely fewer nodes, nor even less informa- 
tion (as measured by the number of three- 
item statements). Fewer nodes and fewer 
three-item statements are, nevertheless, in- 
dications of minimality in many cases. 

Minimality relates to two interpretations 
of multiple branching (Nelson and Platnick, 
1980) and also to basic procedures of three- 
item analysis (Nelson and Ladiges, 1991b: 
table 6, Example 6). The example offered by 
the latter authors is the simplest possible: 
A(BC) + A(DE) = A(BC)(DE). The example 
stated in words is that if B and C are related 
more closely than to A, and D and E are 
related more closely than to A, then the min- 
imal tree describing both relationships is tree 
A(BC)(DE). There are many other nonmi- 
nimal trees that describe the two relation- 
ships, for example tree A(BCDE). Why is the 
former (with two nodes) minimal relative to 
the latter (with only one node)? 

A relevant observation is that an addition- 
al relationship, A(BD), changes the minimal 
tree from A(BC)(DE) to A(BCDE): A(BC) + 
A(DE) + A(BD) = A(BCDE). The latter tree, 
in other words, is determined by more rather 
than fewer data. Another observation is that 
no additional relationship—one different 
from but consistent with the three above— 
can change the minimal tree from A(BCDE) 
to A(BC)(DE). Addition ofa relationship such 
as A(BE) leaves the minimal tree unchanged; 
addition of (BC)D changes the minimal tree 
to A((BC)DE). A final observation is that the 
polytomy of tree A(BC)(DE) is basal, in con- 
trast to that of tree A(BCDE). According to 
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Nelson and Platnick (1980), a polytomy ren- 
ders ambiguous any nodes distal to it. A tree 
with a basal polytomy is more ambiguous 
than a tree with a dichotomous basal node. 
A minimal tree, being least resolved, is most 
ambiguous relative to other trees of shortest 
length for a particular data matrix. 

It was the view of Nelson and Platnick 
(1980) that minimal trees are implicit in sys- 
tematics because of the necessity of working 
with trees that, in reflecting the current state 
of knowledge, are not perfectly resolved— 
trees that always contain multiple branch- 
ings. We have attempted to make explicit use 
of minimal trees in the cladistic analysis of 
geographic data—an enterprise that, relative 
to systematics in general, is more problematic 
because geographic data are generally less suf- 
ficient, in the current state of knowledge, for 
full resolution of a tree of areas. 

Except for the minimal trees found here 
for North American fishes and tropical 
American beetles, how are systematic data 
for these two groups—which are well worked 
out by current standards—to be reliably un- 
derstood in a geographic sense? The inter- 
pretations offered by Mayden (1988) and Page 
and Lydeard (1994) are not only overre- 
solved for the data but, in the case of the 
fishes at least, in apparent conflict with them. 


CONSISTENCY 


The three studies, of which the results are 
analyzed above— midges, fishes, beetles—are 
typical of many efforts in modern systemat- 
ics, which attempt to determine detailed re- 
lationships among organisms of a particular 
taxon. In a cladistic sense, success of effort is 
measured in number and reliabilty of nodes 
of cladograms offered as a result. Nodes are 
graphic representations of relationships and 
of taxa, too, whether taxa be formally named 
or not. The cladograms for midges, fishes, 
and beetles (figs. 8, 18, 33) are virtually fully 
resolved, and in that sense they are results of 
successful effort. Reliability of the nodes lies 
in another dimension of success, but that di- 
mension embraces future discovery, with its 
possibiltites of confirmation and refutation. 

A remarkable feature of subtree analysis of 
these cladograms is the near 100% consisten- 
cy of their geographic data as shown by par- 


NELSON AND LADIGES: PARALOGY IN CLADISTIC BIOGEOGRAPHY 45 


Non-Min Tree | 


Non-Min Tree 2 
LA 


Fig. 35. Minimal and nonminimal trees, all of 
shortest length, resulting from parsimony analysis 
of matrices (table 21) for nodes of beetle subtrees 
(fig. 34). Symbols as in figure 32. 


simony analysis of a matrix for the subtree 
nodes. Such consistency is not normally 
shown by results of parsimony analysis of 
systematic data. Nor is such consistency a 
reasonable expectation of any viewpoint 
based on an analogy between geographic and 
systematic data (Sober, 1988). 

When such consistency is claimed in a par- 
ticular case, the claim sometimes meets skep- 
ticism that such a result is obtainable without 
bias. An example is that of Edmunds (1981: 
fig. 6.16; comment in Nelson, 1982, 1984). 
In the 1979 symposium on vicariance bio- 
geography, Edmunds presented five exam- 
ples, one of scorpionflies (Mecoptera) and four 
of mayflies (Ephemeroptera). For each ex- 
ample, there are three genera, endemic to New 
Zealand, southeastern Australia, and south- 
ern South America, in which “‘the New Zea- 
land genus is a sister group of the Australian- 
Magellanic [South American] pair.” The pat- 
tern of distribution is one of the two types 
relevant to New Zealand noted by Brundin 
(see above): “a group in New Zealand is the 
sister group of a group occurring... in South 
America and Australia.”” Edmunds (1981: 
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Fig. 36. Relationships of 25 genera of mayflies 
(after Edmunds, 1981: fig. 6.17), then of “‘family 
Siphlonuridae and part of family Oligonuridae”’ 
and subsequently subdivided into Siphlonuridae 
(node 6), Oniscigastridae (node 7), Ameletopsidae 
(node 4), and Coloburiscidae (node 2; e.g., Peters 
and Campbell, 1991). Symbols: A, Australia; N, 
Nearctic; P, Palearctic; S, South America; Z, New 
Zealand. 


293) commented, without other justification, 
that 


The question concerning the degree to which concor- 
dant cladograms were selected or “‘plucked”’ came up 
both in the auditorium and in private discussion. To 
allow persons to judge in the cases cited above I pres- 
ent a cladogram (figure 6.17) of the members of a 
highly paraphyletic group [of mayflies] considered to 
be one family at the time of analysis. Three other 
families— Baetidae, Caenidae, and Leptophlebidae— 
remain to be analyzed. The Nannochoristidae [Me- 
coptera] (figure 6.16) are a selected example suggesting 
that entire cool adapted lotic water communities were 
vicariated. 


Edmunds’ cladogram of mayflies (fig. 36) re- 
duces to four subtrees (fig. 37), which for the 
southern areas duplicate his four examples. 
Remarkably, parsimony analysis of a matrix 
for all 24 nodes of the cladogram of mayflies 
(component data, table 22, above left) and of 
a matrix for the 7 informative nodes of the 
subtrees (component or three-item data, ta- 
ble 22, above right and below) yields one and 
the same tree with 100% consistency (the same 
as Subtree 1 of fig. 37). Edmunds’ four ex- 
amples and the consistency of their geograph- 
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Subtree 1 
14=-2 1-23 1(S 
F [ Eo 
18== § 
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Fig. 37. Subtrees derived from cladogram of 
mayflies (fig. 36). Symbols as in figure 36. 


ic data are, therefore, an unbiased represen- 
tation of the cladogram of mayflies (fig. 36). 

In this case, again by chance, the compo- 
nent data for all 24 nodes of the cladogram 
of mayflies (fig. 36) eliminate redundancy and 
paralogy such that, with parsimony analysis, 
a satisfactory result is achieved. Three-item 
analysis of the 24 nodes of the mayfly clado- 
gram yields 1966 statements. Parsimony 
analysis of three-item matrices (uniformly and 
fractionally weighted x8) yields a different 
tree, S(A(Z(NP))), lengths 2964 and 22597, 
respectively, ci 66, ri 49. The different result 
with reduced consistency is attributable en- 
tirely to the effects of paralogy, as captured 
by the three-item data. 


REVIEW OF MORRONE AND CARPENTER 
(1994) 


For cladistic biogeography, Morrone and 
Carpenter (1994) reviewed available meth- 
ods and computer implementations. They 
analyzed 10 data sets previously published, 
found that different methods give widely dif- 
ferent results, and concluded that ‘“‘current 
computer implementations of the methods 
remain unsatisfactory” (p. 114). 

It is not necessary to apply subtree analysis 
to all 10 data sets in order to discover that 
most of the variability of their results stems 
not from different methods as such but rather 
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2 Above left, component matrix for 24 nodes of clado- 
gram for mayflies (fig. 36). Above right, component ma- 
trix for seven informative nodes of four subtrees (fig. 
37). Below, three-item matrix for seven informative nodes 
of four subtrees. Symbols: OG, Outgroup; A, Australia; 
N, Nearctic; P, Palearctic; S, South America; Z, New 
Zealand. 


from the effects not only of paralogy as vari- 
ably captured by different methods, but also 
of missing data that cause programs to save 
overresolved trees. We apply subtree analysis 
to five of their data sets. 


GERM (Fic. 38) 


Three genera of weevils include species en- 
demic to the subantarctic domain of southern 
South America (Morrone, 1992a, 1992b, 
1993a; summary in Morrone et al., 1994). 
Twenty-five species occur among four areas. 
The combined cladograms (fig. 38, left) re- 
duce to two identical subtrees of three areas 
only (fig. 38, right). The variable results found 
by Morrone and Carpenter (1994: tables 1, 
4) include all 15 possible fully resolved trees 
for four areas. In a subsequent publication, 
Morrone and Anderson (1995) describe ad- 
ditional species and revise relationships 
within one genus (Falklandius, fig. 38, node 
3), removing from it one species (fig. 38, spe- 
cies 25) and placing that species in a position 


fig. 52); 2, Antarctobius (after Morrone, 1992b: fig. 
36); 3, Falklandius (after Morrone, 1992a: fig. 47). 
The corresponding cladograms of Morrone and 
Carpenter (1994: fig. 1) show fewer terminal taxa 
but preserve geographic data relevant to subtree 
analysis. Right, two subtrees derived from com- 
bined cladograms (left). Symbols: I, Islas Malvi- 
nas; M, Magellanic forest; P, Magellanic moor- 
land; V, Valdivian forest. 


basal relative to all other species of the genus. 
These changes eliminate the second of the 
two subtrees and thereby diminish the results 
obtained by subtree analysis of cladograms 
of these organisms. 


INDO (Fics. 39, 40) 


Ten groups of plant bugs include species 
endemic to Africa and Australasia (Schuh and 
Stonedahl, 1986). Forty-six species and spe- 
cies groups occur among 15 areas. The com- 
bined cladograms (fig. 39, left) reduce to 11 
subtrees (fig. 39, right). A minimal tree (fig. 
40, middle) derived from parsimony analysis 
of a component matrix for subtree nodes is 
similar to the hand resolution of Schuh and 
Stonedahl (fig. 40, above), which proves ov- 
erresolved for areas 3 and 7 (S India and N 
Burma). A minimal tree (fig. 40, below) de- 
rived from parsimony analysis of three-item 
matrices (no difference between uniform and 
fractional weighting) includes a division be- 
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Fig. 39. Left, combination of cladograms for 10 genera and species groups of plant bugs (Insecta, 
Heteroptera, Miridae) of the Indo-West Pacific Region (after Schuh and Stonedahl, 1986: figs. 5-7). 
Cladograms (nodes): 3, Auricillocoris, 4, Ctypomiris group; 5, Sejanus ecnomious species group; 6, 
Leucophoroptera philippinensis species group; 7, Dioclerus; 8, Myiocarpus; 9, Mertila; 10, Harpedona; 
11, Prodromus; 12, Thaumastomiris. Other relevant nodes: 0, Miridae; 1, Phylinae; 2, Eccritotarsini. 
All widespread terminal taxa are species (4, 8, 10, 23, 25, 27, 40, 43). Right, 11 subtrees derived from 
combined cladograms (left). Abbreviations: Borneo, northern Borneo; N & S Phil, northern and southern 


Philippines; Thailand, northwestern Thailand. 


tween areas of the Asian mainland (areas 2, 
4—6, 9) and those of the Indo-Australian Ar- 
chipelago (areas 8, 10-15). These three results 
(fig. 40, above, middle, below) are signifi- 
cantly consistent among themselves. Mor- 
rone and Carpenter (1994: table 6) find that 
most methods yield numerous trees (up to 
5455) for these data. 


MASA (FIG. 41) 


Eighteen genera and one tribe of two sub- 
families of wasps include species endemic to 


seven continental areas of the world (Car- 
penter, 1993). The combined cladograms (fig. 
41, left) reduce to 11 subtrees (fig. 41, right). 
Carpenter (1993) and Morrone and Carpen- 
ter (1994) noted that the geographic patterns 
shown by the two subfamilies (fig. 41, nodes 
1 and 2, Masarinae and Polistinae, respec- 
tively) are different. The corresponding sub- 
trees comprise two groups, one of five sub- 
trees (fig. 41, subtrees including terminal taxa 
1-14) and one of six subtrees (fig. 41, subtrees 
including terminal taxa 23-35). Parsimony 
analysis of a matrix for nodes of subtrees of 


1996 


each group yields one minimal tree with 100% 
consistency: A(S(FNP)) for Masarinae, 
(NS)(ACFO) for Polistinae. Carpenter (1993: 
153) construed the different patterns to reflect 
the relative age of origin of the two groups 
(Masarinae older, Polistinae younger): ““North 
America and Australia may have been 
reached after breakup [of Gondwana] via dis- 
persal in the paper wasps [Polistinae].’’ Mor- 
rone and Carpenter (1994: table 6) report 
varying results (up to 1000 trees) with various 
methods. 


BIRD (Fics. 42, 43) 


Eight genera and species groups of birds of 
five families include species endemic to trop- 
ical South America (Cracraft, 1988). Forty 
species occur among nine areas. According 
to Cracraft, there are two geographic patterns. 
Four groups (fig. 42, left; nodes 5, 7, 17, 18) 
show variants of the pattern NE(SE(SW 
NW))—termed “‘Guinanan-Amazonian”’ (p. 
222, pattern 1, his fig. 2)—with other areas 
variously represented (fig. 42: SEB, CA, CHO, 
NC, IM in node 5; SEB, CA, CHO, IM in 
node 7; IM in node 18). Area SEB occurs in 
two groups, once basally (node 5), once ter- 
minally (node 7). Four other groups (nodes 
3, 6, 11, 12) show the pattern (NE NW)(SE 
SW)—termed “Trans-Amazonian” (p. 224, 
pattern 2, his fig. 3)—with no other areas in- 
cluded. 

Cracraft treated the eight groups as sub- 
trees, and represented their component data 
and species distributions in complete matri- 
ces: one matrix for each of the two assem- 
blages of four groups and one matrix for all 
eight groups. Parsimony analysis of the com- 
plete matrix for the pattern 1 assemblage 
(nodes 5, 7, 17, 18, informative characters 
only) yields three trees (fig. 43, Pl Trees 1- 
3), length 20, ci 80, ri 85, all showing pattern 
1: NE(SSE(SW NW)). Parsimony analysis of 
the complete matrix for the pattern 2 assem- 
blage (nodes 3, 6, 11, 12, informative char- 
acters only) yields one tree, length 8, ci 100, 
ri 100, showing pattern 2: (NE NW)(SE SW). 
Parsimony analysis of the complete matrix 
for both assemblages (informative characters 
only) yields five trees, length 35, ci 68, mi 68, 
all showing pattern 2: (NE NW)(SE SW). Cra- 
craft (1988: 229) concluded that 
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13 New Guinea 
= 14 New Ireland 
15 Solomons 


= 13 New Guinea 
14 New Ireland 
15 Solomons 


Fig. 40. Area cladograms for plant bugs. Above, 
after Schuh and Stonedahl (1986: fig. 9). Middle, 
minimal tree from parsimony analysis of com- 
ponent matrix for 21 nodes of 11 subtrees (fig. 39, 
right), length 23, ci 91, ri 93. Below, minimal tree 
from parsimony analysis of three-item matrix (87 
statements) for 11 subtrees (fig. 39, right), length 
99, ci 87, ri 86 (no difference between uniformly 
and fractionally weighted matrices). 


Parsimony analysis has resulted, essentially, in a sin- 
gle historical hypothesis for the core areas of endem- 
ism within the Amazon basin [NE, NW, SE, SW]. 
These results are analogous in many respects to cases 
in which character distributions are ambiguous -and 
narrowly favor one systematic hypothesis over an- 
other. 


Cracraft (1988: 221, 230) found the result (of 
parsimony analysis of the complete matrix 
for both assemblages) unsatisfactory because 


Current methods. . . are reductionist in the sense that 
they attempt to resolve multiple conflicting patterns 
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Fig. 41. Left, combination of cladograms for 18 genera and one tribe of two subfamilies of wasps 
(Insecta, Hymenoptera, Vespidae, Masarinae and Polistinae) with worldwide distribution (after Car- 
penter, 1993: figs. 7.6, 7.8). Cladograms (nodes): |, Masarinae; 2, Polistinae. Widespread terminal taxa 
treated as nodes: Ceramius (12), Jugurtia (24), Celonites (26), Quartinia (27), Polistes (5), Epiponini (9), 
Mischocyttarus (10), Polybioides (14), Belonogaster (18), Ropalidia (19), Parapolybia (20). Right, 11 
subtrees derived from combined cladograms (left). Symbols: A, Australia; C, Far East (China, Korea); 
F, Africa; N, North America; O, Oriental; P, Palearctic; S, South America. 


across species-cladograms to a singular, less complex 
pattern. Parsimony analysis. . . is designed to reduce 
the complexity of multiple patterns to one (or more) 
most parsimonious hypothesis. Because it relies upon 
a questionable analogy to methods in systematics, 
however, biogeographic parsimony analysis has the 
potential to obscure the history of a biota rather than 
reveal it. 


From this standpoint, therefore, a worthy 
(‘“‘nonreductionist’’?) method should have 
found not one geographic pattern but two or 
more patterns (see below). 

The combined cladograms (fig. 42, left) re- 
duce to eight subtrees (fig. 42, right) corre- 
sponding to Cracraft’s eight species groups 
and their geographic data. Many other groups 
of birds occur in these areas of South Amer- 
ica, and those of the combined cladograms 
are only a sample. How the sampling was 


accomplished is unexplained. Unlike Ed- 
mund’s example of mayflies (see above), it is 
not possible to ascertain that the groups in 
the combined cladograms, and in the subtrees 
derived from them, are an unbiased repre- 
sentation of the entire cladogram of Ama- 
zonian birds. Rather, the groups seem to have 
been selected, or “‘plucked,”’ for the purpose 
of illustrating Cracraft’s argument and are 
hardly different from a hypothetical example 
concocted for the same purpose. 

The subtrees may, nevertheless, be sorted 
into two classes, corresponding to Patterns 1 
and 2. Parsimony analysis of a component 
matrix for nodes of Pattern 1 subtrees yields 
three trees (fig. 43, P1 Trees 1-3), with a strict 
consensus showing a basal polytomy among 
areas 1, 2-3, 4, and 5-8 (fig. 43, Con3). Par- 
simony analysis of three-item matrices for 
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Fig. 42. Left, combination of cladograms for eight genera and species groups of birds of tropical South 
America (after Cracraft, 1988: figs. 2, 3). Cladograms (nodes): 3, Psophia; 5, Pionopsitta; 6, Pionites; 7, 
Selenidera, 11, Lanio; 12, Pipra; 17, Pteroglossus viridis group; 18, Pteroglossus bitorquatus group . 
Other relevant nodes: 1, Psittacidae; 2, Ramphastidae; 4, Passeriformes; 8, Pteroglossus. Right, eight 
subtrees derived from combined cladograms (left). Symbols: CA, Central America; CHO, Choco; IM, 
Imeri; NC, Nechi; NE & NW, northeasten and northwestern Amazonia; SE & SW, southeastern and 


southwestern Amazonia; SEB, southeastern Brazil. 


nodes of Pattern | subtrees yields either (uni- 
formly weighted matrix) one tree (fig. 43, 
P1Tree 3) or (fractionally weighted matrix x 
12) two trees (fig. 43, Pl Trees 2 and 3). Par- 
simony analysis of a component matrix for 
nodes of Pattern 2 subtrees yields one min- 
imal tree (fig. 43, P2Tree), and three-item 
matrices yield the same. Parsimony analysis 
of a component matrix for nodes of all eight 
subtrees yields 10 trees, all showing Pattern 
2, with a strict consensus of only three nodes 
(fig. 43, Conl10). Parsimony analysis of a 
three-item matrix for nodes of all eight sub- 
trees yields either (uniformly weighted ma- 
trix) one tree (fig. 43, Pl Tree 3) or (fraction- 
ally weighted < 12 matrix) two trees (fig. 43, 
P1Trees 2 and 3). 

The results of parsimony analysis of a com- 


ponent matrix for nodes of all subtrees are 
hardly different from those obtained by Cra- 
craft (the corresponding matrices are nearly 
the same). The results of parsimony analysis 
of a three-item matrix for nodes of all sub- 
trees differ, showing Pattern 1 rather than 
Pattern 2. That difference exemplifies Cra- 
craft’s concern that ambiguous characters, 
depending on the method used, might “‘nar- 
rowly favor one ... hypothesis over anoth- 
er,”’ but the results jointly recover both Pat- 
terns 1 and 2 from the entire data and in that 
sense function in this case as the ‘“‘nonred- 
uctionist methodology”’ that Cracraft sought 
for but did not attain: 


Biogeographic methodology needs to develop analyt- 
ical techniques in order that complex historical pat- 
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Fig. 43. Results of parsimony analysis of matrices for subtrees derived from combined cladograms of 
eight groups of birds (fig. 42). P1Trees 1-3, of component matrix for subtrees (13 nodes only) corre- 
sponding to Pattern 1 (subtrees with basal nodes 5, 7, 17, 18), length 16, ci 81, m 87. Con3, strict 
consensus of P1Trees 1-3. P2Tree, minimal tree, from parsimony analysis of component matrix for 
nodes of subtrees (eight nodes only) that correspond to Pattern 2 (subtrees with basal nodes 3, 6, 11, 
21), length 8, ci 100, ri 100. Con10, strict consensus of 10 trees, from parsimony analysis of component 
matrix for 21 nodes of eight subtrees, length 31, ci 67, ri 68. Parsimony analysis of three-item matrices 
(129 statements) for Pattern-1 subtrees (eight nodes only) yields either P1Tree3 (uniformly weighted 
matrix), length 146, ci 88, ri 86, or Pl Trees 2 and 3 (fractionally weighted <x 12 matrix), length 1494, 
ci 89, ri 88. Parsimony analysis of three-item matrices (145 statements) for all 21 nodes of subtrees 
yields either P1Tree 3 (uniformly weighted matrix), length 174, ci 83, ri 80, or PlTrees 2 and 3 


(fractionally weighted <x 12 matrix), length 1830, ci 83, ri 80. 


terns are not concealed by estimates that are very 
much less complex (p. 233). 


Morrone and Carpenter (1994: table 5) found 
that some methods yield only one tree, and 
most methods two or three trees, for these 
data. Seven different trees were found (their 
fig. 7), all, curiously, showing Pattern 1: 
NE(SE(SW NW)). 


LIST (FIG. 44) 


Two genera and one species group ofa third 
genus of weevils and one species group of 
asters include species endemic to southern 
South America (Morrone, 1993b; Lanteri, 
unpub.; Anderberg and Freire, 1991). Forty- 
two terminal taxa occur among four areas. 
The combined cladograms reduce to five sub- 
trees, which conflict among themselves. Par- 
simony analysis of a component matrix for 


nodes of all subtrees yields two trees, 
B(D(AC)) and B(A(CD)), length 8, ci 75, ri 
66, with strict consensus B(ACD). Parsimony 
analysis of three-item matrices (eight state- 
ments) for all nodes of subtrees yields the 
same, length 10, ci 80, ri 75 (no difference 
between uniformly and fractionally weighted 
matrices). Morrone (1993b: fig. 6) found one 
tree, A(B(CD)). With a variety of methods, 
Morrone and Carpenter (1994: table 6) found 
either one tree or three trees. In their analyses 
of each of the four groups, they found several 
trees, usually all of the 15 possible resolved 
trees for four areas. Subtree analysis, in con- 
trast, yields a single tree for each of the four 
groups: subtrees 1 and 2 combine with 100% 
consistency as D(A(BC)), which conflicts with 
the other three subtrees, which are 100% con- 
sistent and combine as B(A(CD)). If conflict 
is reckoned as evidence of more than one 
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Fig. 44. Left, combination of two genera and one species group of weevils (Insecta, Coleoptera, 
Curculionidae) and one genus group of asters (Asteraceae, Gnaphaliae) of southern South America. 
Cladogram (nodes): 2, Lucilia genus group (after Anderberg and Freire, 1991); 3, Listroderes (after 
Morrone 1993b); 4, Naupactus taeniatulus species group (after Lanteri, personal commun. to Morrone, 
1993b: 402); 5, Hyperoides (after Morrone, 1993c). Other relevant node: 1, Curculionidae. The corre- 
sponding cladograms of Morrone and Carpenter (1994: fig. 3) show fewer terminal taxa but preserve 
grographic data relevant to subtree analysis. Right, five subtrees derived from combined cladograms 
(left). Symbols: A, central Chile; B, subantarctic; C, central Argentina; D, Chaco. 


geographic pattern, then in these data there 
is evidence of two patterns. 


THE PAST AND THE FUTURE 


For cladistic biogeography, the facts of geo- 
graphic distribution of organisms are coher- 
ent patterns related to, and explained by, his- 
torical processes of geographic change. This 
view had been epitomized in the slogans of 
Croizat (1964: 857-858) that “dispersal for- 
ever repeats” and that “‘earth and life evolve 
together.”’ During recent decades this view 
was strengthened by developments both in 
geology and in biology—by the revival of 
continental drift (plate tectonics) and by the 


rejuvenation of systematics (cladistics). These 
developments did not in themselves render 
more coherent the facts of geographic distri- 
bution of organisms, but they heightened ex- 
pectations that such discovery was within 
reach of empirical investigation. 

For a long time, the expectations had been 
in existence. Early in the modern history of 
cladistics they were reformulated (Hennig, 
1960, 1966) and given significant impetus 
(Brundin, 1966). Subsequent developments 
within cladistics offered the hope that geo- 
graphic data, when associated with nodes of 
cladograms generally and when analyzed by 
the exact methods of parsimony analysis, 
would prove coherent—even convincing to 
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other, ecologically oriented, biogeographers 
and to biologists in general. This hope, still 
persistent today, seems to have been realized 
only to a limited degree. It is doubtful, for 
example, that the accumulated findings of 
cladistic biogeography of the last two decades 
have proven any more convincing than those 
of Croizat of the previous few decades (bib- 
liography in Craw and Gibbs, 1984; com- 
ment by Seberg, 1986; Platnick and Nelson, 
1989; Mayden, 1991; Nelson, 1994). 

Subtrees simplify the cladistic analysis of 
geographic data. Subtree analysis does so by 
identifying paralogous nodes so that geo- 
graphic data need not be associated with par- 
alogous nodes. In this respect the rationale 
of subtree analysis is at variance with points 
of view, including our own at times, that have 
surfaced during the recent history of cladistic 
biogeography. In retrospect, these points of 
view seem tacitly to assume that nodes are 
composed (in part) of geographic data and 
that troubling variation of geographic data 
from node to node is random among clado- 
grams in general. The contrasting assumption 
of subtree analysis is that troubling variation 
among geographic data that might be asso- 
ciated with some nodes is merely the effect 
of paralogy, which is nonrandom and in- 
creases basally in cladograms in general. 

Subtree analysis captures some, at times 
many, of the routine practices of biogeo- 
graphic analysis, for example the practice, 
seemingly universal, of comparing clado- 
grams of different groups of organisms for the 
purpose of identifying such common pat- 
terns—or such different patterns—as might 
be present in the associated geographic data 
(Grande, 1994). 

No one would argue that there is more than 
one tree of life. The rationale of subtree anal- 
ysis addresses the question of how that one 
tree—to the degree that it is currently 
known—is best subdivided for the purposes 
of comparison and geographic analysis. To 
our knowledge, this question has never before 
been addressed, nor has any algorithm (ra- 
tionale) been previously offered as its answer. 
Such an algorithm seems required if cladistic 
biogeography is to have a rational basis. The 
lack of such basis perhaps explains the frus- 
tration sometimes expressed, for good rea- 
son, with results of current methods of anal- 
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ysis (e.g., by Morrone and Carpenter, 1994). 
With such basis, one may evaluate, even im- 
prove upon, the division of that one tree into 
two or more parts (subtrees) that might be 
assumed comparable by whomsoever. 

Heretofore, the division, accomplished ar- 
bitrarily in all cases known to us, has been 
into groups (taxa) independent in the sense 
that they have no nodes in common. Subtree 
analysis imposes no such requirement, and 
one or more nonterminal node may appear 
as an element common to two or more sub- 
trees. This novel feature of subtree analysis 
may appear counterintuitive because inde- 
pendence of subtrees is subsequently as- 
sumed under the rationale of parsimony anal- 
ysis. 

Here we forego discussion of whether par- 
simony analysis (of areas) is essential to, or 
ultimately meaningful for, cladistic bioge- 
ography. Relevant, nevertheless, is that the 
subtree algorithm requires no computer pro- 
gram and can be implemented by hand even 
for complex cladograms. The resulting sub- 
trees tend to be simple, and their meaning 
and consistency (or mutual conflict) are gen- 
erally evident without parsimony analysis. 

Subtree analysis offers apparent advantag- 
es over existing procedures, as is evident in 
the above analyses. Some data that seemed 
obscure (e.g., the GERM data reviewed by 
Morrone and Carpenter) are rendered clear 
and simple. Other data that seemed clear 
enough but otherwise suspect (e.g., those of 
Edmunds) are given rational justification. 
These appealing features seem general prop- 
erties of subtree analysis, which might not be 
the ultimate solution for cladistic biogeog- 
raphy but is possibly a step in that direction. 


SUMMARY 


1. Nodes of cladograms of organisms are 
not directly informative of geographic rela- 
tionships that might exist between organ- 
isms. For nodes to be informative, geograph- 
ic data must first be deliberately associated 
with them. Nodes (taxa) relating organisms 
in geographic areas that overlap are deemed 
“paralogous” (by analogy with the “‘paralo- 
gy” of molecular biology). Geographic data 
need not be associated with paralogous nodes. 

2. Subtree analysis is a novel method of 
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potential value in cladistic biogeography. It 
proceeds by reducing one more or less com- 
plex cladogram to a one or more subtree that 
is paralogy free in the geographic sense. Geo- 
graphic data are then associated with infor- 
mative nodes of each subtree. Data so asso- 
ciated appear to be the only data actually 
relevant to cladistic biogeography. 

3. Subtree analysis accords with the general 
practice of comparing cladograms for two or 
more taxa for the purpose of cladistic analysis 
of geographic relationship. Because clado- 
grams for two or more groups of organisms 
are parts of the one cladogram embracing all 
of life, subtree analysis provides a rationale 
for subdividing that one cladogram for that 
purpose. 

4. An algorithm for subtree analysis was 
developed, implemented in a preliminary 
MS-DOS program, and applied to the bench- 
mark studies in cladistic biogeography of 
Brundin (1966) and Mayden (1988), as well 
as to cladograms in studies reviewed by Craw 
(1989), Page and Lydeard (1994), and Mor- 
rone and Carpenter (1994). 

5. Data associated with nodes of subtrees 
were represented in matrices of two kinds 
(component and three item) for the purpose 
of parsimony analysis, which generally yield- 
ed the same trees of high consistency, ap- 
proaching or reaching 100%. Problems pre- 
viously encountered with parsimony analysis 
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of geographic data, as reported, for example, 
in the review by Morrone and Carpenter 
(1994), are largely the effects of geographic 
paralogy and disappear with subtree analysis. 

6. Representation in matrices of geograph- 
ic data (associated with nodes of subtrees) 
usually entails numerous missing-data en- 
tries. Missing data commonly permit current 
programs (Hennig86, PAUP) to save over- 
resolved trees (with spurious nodes) that pre- 
vent straightforward realization of an infor- 
mative area cladogram. In such cases, an in- 
formative result is generally obtainable by 
manual collapse of spurious nodes and more 
basal placement, by trial and error, of single 
areas and combinations of them. 

7. Geographic data have sometimes been 
associated with widespread species and rep- 
resented in a matrix for parsimony analysis. 
Such data apparently add little or nothing 
(except lower consistency and spurious res- 
olution) to results achievable by parsimony 
analysis of a matrix for data associated with 
nodes only. 
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